Applied Econometric 
Time Series 


WALTER ENDERS 


Towa State University 


JOHN WILEY & SONS, INC. — 


ee ry eR a IE a ee ARE ET Ee A een A, 


oa oot SDE LY SER NTE 


te 


PREFACE 


This book was borne out of frustration. After returning from an enjoyable and pro- 


ductive sabbatical at the University of California at San Diego, I began expanding 


: the empirical content of my graduate-level classes in macroeconomics and intema- 
; tional finance. Students’ interest surged as they began to understand the concurrent 


development of macroeconomic theory and time-series econometrics. The differ- 


` ence between Keynesians, monetarists, the rational expectations school, and the 


real business cycle approach could best be understood by their ability to explain the 
empirical regularities in the economy. Old-style macroeconomic models were dis- 
carded because of their empirical inadequacies, not because of any logical inconsis- 
tencies. 

Iowa State University has a world-class Statistics Department, and most of our 
economics students take three of four statistics classes. Nevertheless, students’ 
backgrounds were inadequate for the empirical portion of my courses. J needed to 
present a reasonable number of lectures on the topics covered in this book. My 
frustration was that the journal articles were written for those already technically 
proficient in time-series econometrics. The existing time-series texts were inade- 
quate to the task. Some focused on forecasting, others on theoretical econometric 
issues, and still others on techniques that are infrequently used in the economics lit- 
erature. The idea for this text began as my class notes and use of handouts grew in- 
ordinately. Finally, I began teaching a new course in applied time-series economet- 
rics. 

My original intent was to write a text on time-series macroeconometnics. Fortu- 
nately, my colleagues at Iowa State convinced me to broaden the focus; applied mi- 
croeconomists were also embracing time-series methods. I decided to include ex- 
amples drawn from agricultural economics, international finance, and some of my 


work with Todd Sandler on the study of transnational terrorism. You should find+_ ci 


the examples in the text to provide a reasonable balance between macroeconomic 
and microeconomic applications. 
The text is intended for those with some background in multiple regression:. 


analysis. I presume the reader understands the assumptions underlying the use Of 


ordinary least squares. All of my students are familiar with the concepts of correla- 
tion and covariation; they also know how to use t-tests and F-tests within a regres- 
sion framework. J use terms such as mean square error, significance level, and un- 
biased estimate without explaining their meaning. The last two chapters of the text 
examine multiple time-series techniques. To work through these chapters, it is nec- 
essary to know how to solve a system of equations using matrix algebra. Chapter 1, 
entitled “Difference Equations,” is the comerstone of the text. In my experience,’ 
this material and a knowledge of regression are sufficient to bring students to the“ 
point where they are able to read the professional journals and to embark on a seri- 
ous applied study. : E, 


~ 
$ 
t 


I believe in teaching by induction. The method is to take a simple example and 


build towards more general and more complicated models and econometric proce- ` 


dures. Detailed examples of each procedure are provided, Each concludes with a 
step-by-step summary of the stages typically employed in using that procedure. The 
approach is one of learning by doing. A large number of solved problems are in- 


cluded in the body of each chapter. The Questions and Exercises at the end of each 


chapter are especially important. They have been designed to complement the ma- 
terial in the text. In order to work through the exercises, it is necessary to have ac- 
cess to a software package such as RATS, SAS, SHAZAM, or TSP. Matrix pack- 
ages such as MATLAB and GAUSS are not as convenient for univariate models. 
Packages such as MINITAB, SPSSX, and MICROFIT can perform many of the 
procedures covered in the exercises. You are encouraged to work through as many 
of the examples and exercises as possible. The answers to all questions are con- 
tained in the Jnstructor’s Manual. Most of the questions are answered in great de- 
tail. In addition, the Instructor’s Manual contains the data disk and the computer 
programs that can be used to answer the end of chapter exercises. Programs are 
provided for the most popular software packages. 

In spite of all my efforts, some errors have undoubtedly crept into the text. 
Portions of the manuscript that are crystal clear to me, will surely be opaque to oth- 
ers. Towards this end, I plan to keep a list of corrections and clarifications. You can 


receive a copy (of what I hope is a short list) from my Internet address ENDERS@ 
IASTATE.EDU. 


Many people made valuable suggestions for improving the manuscript. I am - 


grateful to my students who kept me challenged and were quick to point out errors. 
Pin Chung was especially helpful in carefully reading the many drafts of the manu- 
script and ferreting out numerous mistakes. Selahattin Dibooglu at the University 
of Hlinois at Carbondale and Harvey Cutler at Colorado State University used por- 
tions of the text in their own courses; their comments concerning the organization, 
style, and clarity of presentation are much appreciated. My colleague Barry Falk 
was more than willing to answer my questions and make helpful suggestions. Hae- 
Shin Hwang, Texas A and M University; Paul D. McNelis, Georgetown University; 
Hadi Estahan, University of Illinois; M. Daniel Westbrook, Georgetown 
University, Beth Ingram, University of Iowa; and Subhash C. Ray, University of 
Connecticut all provided insightful reviews of various stages of the manuscript. 
Julio Herrera and Nifacio Velasco, the “food gurus” at the University of Valladolid, 
helped me survive the final stages of proofreading. Most of all, I would like to 
thank my loving wife Linda for putting up with me while I was working on the text. 


CHAPTER 1: Difference Equations ieee, at 1 
1. Time-Series Models 2 
2. Difference Equations and Their Solutions oh ea 7 
3. Solution by Iteration . l 10 
4. An Altemative Solution Methodology cannes 16 
5. The Cobweb Model a : 20 
6. Solving Homogeneous Difference Equations l . 25 
7. Finding Particular Solutions for Deterministic Processes 35 
8. The Method of Undetermined Coefficients 38 
9. Lag Operators 45 

10. Forward-Versus Backward-Looking Solutions 48 

Summary and Conclusions 53 
Questions and Exercises ad l 54 

Endnotes : 57 

Appendix 1: Imaginary Roots and de Moivre’s Theorem 57 

Appendix 2: Characteristic Roots in Higher-Order Equations l 59 

CHAPTER 2: Stationary Time-Series Models ` 63 
1. Stochastic Difference Equation Models sO 63 
2. ARMA Models 67 
3. Stationarity 68 
4, Stationarity Restrictions for an ARMA(p, q) Model 72 
5. The Autocorrelation Function E 78 
6. The Partial Autocorrelation Function 82 
7. Sample Autocorrelations of Stationary Series ; 86 
8. Box—Jenkins Model Selection a ; 95 
9. The Forecast Function 99 
10. A Mode! of the WPI baw? More Oe 106 
11. Seasonality Td 

Summary and Conclusions be Rf i 118 
Questions and Exercises l Í = e ALO 

Endnotes p aueue 128 

Appendix: Expected Values and Variance “ i 128 a 


- CHAPTER 3: Modeling E Economic Time Series: 


Trends and Volatility —- ra 135 


1. Economic Time Series: The Stylized Facts a 7 E ad 135 


2. ARCH Processes , aa ne 4 i i 139 2 


= 


td 


HOLDON H 


. ARCH and GARCH Estimates of Inflation 
. Estimating a GARCH Model of the WPI; An Example 
. A GARCH Model of Risk 


The ARCH-M Model 


Maximum Likelihood Estimation of GARCH and ARCH- 1N Models 


Deterministic and Stochastic Trends 


. Removing the Trend 


Are There Business Cycles? 

Stochastic Trends and Univariate Decompositions 
Summary and Conclusions 

Questions and Exercises 


Endnotes 
Appendix: Signal Extraction and Minimum Mean Square Errors 


CHAPTER 4: Testing for Trends and Unit Roots 


MAUR YWNS 


. Unit Root Processes 

Dickey—Fuller Tests 

Extensions of the Dickey-Fuller Tests 
Examples of the Augmented Dickey-Fuller Test 
Phillips—Perron Tests 

Structural Change 

Problems in Testing for Unit Roots 
Summary and Conclusions 
Questions and Exercises 


Endnotes 
Appendix: Phillips-Perron Test Statistics 


CHAPTER 5: Multiequation Time-Series Medels °° 

. Intervention Analysis 

. Transfer Function Models 

. Estimating a Transfer Function 

. Limits to Structural Multivariate Estimation 
. Introduction to VAR Analysis 

. Estimation and Identification 


O 


The Impulse Response Function 
Hypothesis Testing 


. Example of a Simple VAR: Terrorism and Tourism in Spain 


Structural VARs 


. Examples of Structural Decompositions 
. The Blanchard and Quah Decomposition. 
. Decomposing Real and Nominal Exchange Rate Movements: An Example 


Summary and Conclusions 
Questions and Exercises 


Endnotes 


149 
152 
156 
158 
162 
166 
176 
181 
185 
195 
196 
204 


206. 


211 
212 
221 
225 


233 . 


239 
243 
251 
260 
261 
265 
265 


269 
270 
277 
286 
291 
294 
300 
305 
312 
316 
320 
324 


-331 


338 
342 
343 
352 


CHAPTER 6: Cointegration and Error-Correction 


Models 

Linear Combinations of Integrated Variables 
. Cointegration and Common Trends 
. Cointegration and Error Correction 
. Testing for Cointegration: The Engle-Granger Metndbiopy 
. Illustrating the Engle-Granger Methodology 
. Cointegration and Purchasing-Power Parity 
: Characteristic Roots, Rank, and Cointegration 
. Hypothesis Testing in a Cointegration Framework 
. Iustrating the Johansen Methodology 
. Generalized Purchasing-Power Parity 

Summary and Conclusions 

Questions and Exercises 
Endnotes 
Appendix 


DONDURAN = 


— 


STATISTICAL TABLES 


A. Empirical Distributions of the t Statistics ae we. ee 


B. Empirical Distributions of the $ Statistics 
C. Empirical Distributions of the Knox and race Statisties 


REFERENCES 
AUTHOR INDEX 


SUBJECT INDEX 


355 
356 
363 
365 
373 
377 
381 
385 
393 
396 
400 
404 
405 
410 
412 


419 


420 
421 


423 


427 


429 


ra ERNEA E E 


ee Chapter 1 art An 


DIFFERENCE EQUATIONS Pad 
ER A RI PTH OI ` a f 


; ; The theory of difference equations underlies all the time-series methods employed 
j pale in later chapters of this text. It is fair to say that time-series econometrics is con- 
cerned with the estimation of difference equations containing stochastic compo- 

fois nents. The traditional use of time-series analysis was to forecast the time path of a 
i i i variable. Uncovering the dynamic path of a series improves forecasts since the pre- 

dictable components of the series can be extrapolated into the future. The growing 
interest in economic dynamics has given a new emphasis to time-series economet- 
rics. Stochastic difference equations arise quite naturally from dynamic economic exes 
models. Appropriately estimated equations can be used for the interpretation of Lie 
economic data and for hypothesis testing. 
The aims of this introductory chapter are to: 


$ | |, Explain how stochastic difference equations can be used for forecasting and to 
ae illustrate how such equations can arise from familiar economic models. The 
‘chapter is not meant to be a treatise on the theory of difference equations. Only 
S those techniques that are essential to the appropriate estimation of linear time- 
ad - f t a series models are presented. This chapter focuses on single-equation models; 
S ` : w multivariate models are considered in Chapters 5 and 6. oo 
2. Explain what it means to “solve” a difference equation. The solution will deter- 
mine whether a variable has a stable or an explosive time path. A knowledge of 
the stability conditions is essential to understanding the recent innovations in 
time-series econometrics. The contemporary time-series literature pays. special 
attention to the issue of stationary versus nonstationary variables. The stability 
conditions underlie the conditions for stationarity. : 


5 3. Demonstrate how to find the solution to a stochastic difference equation, There > , i 3° 

$ ae © are several different techniques that can be used; each has its own relative mer- ` : 
i its. A number of examples are presented to help you understand the different 

methods. Try, to` work through each example carefully. For extra practice, you i , 

should complete the exercises at the end of the chapter. i E ae 


3 


tacega. 


2 Difference Equations 


1. TIME-SERIES MODELS 


The task facing the modern time-series econometrician is to develop reasonably 
simple models capable of forecasting, interpreting. and testing hypotheses concern- 
ing economic data. The challenge has grown over time; the original use of time- 
series analysis was primarily as an aid to forecasting. As such, a methodology was 
developed to decompose a series into a trend, seasonal, cyclical, and an irregular 
component. Uncovering the dynamic path of a series improves forecast accuracy 
since each of the predictable components can be extrapolated into the future. 
Suppose you observe the 50 data points shown in Figure 1.1 and are interested in 
forecasting the subsequent values. By using the time-series methods discussed in 
the next several chapters, it is possible to decompose this series into the trend, sea- 


Figure 1.1 Hypothetical time series. 
lcr ERa de de 8 I e 
rol Observed data Forecast 


o E ces es SEEE VD, a(R E E E 
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 


mory e be hs hal cal 

BL Trend = 
——-— Seasonal 

6l- 


—— Irregular Forecasts 


0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 


Pagina 


Time-Series Models 3 


sonal, and irregular components shown in the lower part of the figure. As you can 
see, the trend changes the mean of the series and the seasonal component imparts a 
regular cyclical pattern with peaks occurring every 12 units of time. In practice, the 
trend and seasonal components will not be the simplistic deterministic functions 
shown in the figure. With economic data, it is typical to find that a series contains 
stochastic elements in the trend, seasonal, and irregular components. For the time 
being, it is wise to sidestep these complications so that the projection of the trend 
and seasonal components into periods 51 and beyond is straightforward. ; 

Notice that the irregular component, while not having a well-defined patter, is 
somewhat predictable. If you examine the figure closely, you will see that the posi- 
tive and negative values occur in runs; the occurrence of a large value in any period 
tends to be followed by another large value. Short-run forecasts will make use of 
this positive correlation in the irregular component. Over the entire span, however, 
the irregular component exhibits a tendency to revert to zero. As shown in the 
lower part of the figure, the projection of the irregular component past period 50 
rapidly decays toward zero. The overall forecast, shown in the top part of the fig- 
ure, is the sum of each forecasted component. 

The general methodology used to make such forecasts entails finding the “equa- 


-tion of motion” driving a stochastic process and using that equation to predict sub- 
: sequent outcomes. Let y, denote the value of a data point at period f; if we use this 
: notation, the example in Figure 1.1 assumed we observed y, through Ys For t = | 
to 50, the equations of motion used to construct components of the y, series are 


Trend: T,=1+0.1t 
Seasonal: S,= 1.6 sin(tn/2) 
Irregular: 1=0.71_,+6€, 


l 


where T, = value of the trend component in period t 
S, = value of the seasonal component in t 

I, = the value of the irregular component in ¢ 
e, = a pure random disturbance in f 


iI 


Thus, the irregular disturbance in ¢ is 70% of the previous period’s irregular distur- 
bance plus a random disturbance term. 

Each of these three equations is a type of difference equation. In its most gen- 
eral form, a difference equation expresses the value of a variable as a function of its 
own lagged values, time, and other variables. The trend and seasonal terms are both 
functions of time and the irregular term is a function of its own lagged value and 
the stochastic variable e,. The reason for introducing this set of equations is to make 
the point that time-series econometrics is concerned with the estimation of differ- 
ence equations containing stochastic components. The time-series econometrician 
may estimate the properties of a single series or a vector containing many interde- 


. pendent series. Both univariate and multivariate forecasting methods are presented 


in the text. Chapter 2 shows how to estimate the irregular part of a series. The first 
half of Chapter 3 considers estimating the variance when the data exhibit periods of 


$ 
$ 
i 
Y 
i 


RRR RSE REIT 


ernest 


iina 


Rani 28d tisma 


uag 


(rene 


wren mage 


[evrere ured 


Difference Equations 


volatility and tranquility. Estimation of the trend is considered in the last half of 
Chapter 3 and in Chapter 4. Chapter 4 pays particular attention to the issue of 
whether the trend is deterministic or stochastic. Chapter 5 discusses the properties 
of a vector of stochastic difference equations and Chapter 6 is concerned with the 
estimation of trends in a multivariate model. 


Although forecasting was the mainstay of time-series analysis, the growing im- 


portance of economic dynamics has generated new uses for time-series analysis. 
Many economic theories have natural representations as stochastic difference equa- 
tions. Moreover, many of these models have testable implications concerning the 
time path of a key economic variable. Consider the following three examples. 


1. The Random Walk Hypothesis: In its simplest form, the random walk model : 


suggests that day-to-day changes in the price of a stock should have a mean 
value of zero. After all, if it is known that a capital gain can be made by buying 
a share on day # and selling it for an expected profit the very next day, efficient 
speculation will drive up the current price. Similarly, no one will want to hold a 
stock if it is expected to depreciate. Formally, the model asserts that the price of 
a stock should evolve according to the stochastic difference equation: 


Yi =), + Eni 
or 
AY = Eni 
_ where y, = the price of a share of stock on day t 


€ 


il 


a random disturbance term that has an expected value of zero 


t+ 


Now consider the more general stochastic difference equation: 


Mint = Op + Ay, F Eni 


The random walk hypothesis requires the testable restriction Qo = a, = 0. 
Rejecting this restriction is equivalent to rejecting the theory. Given the infor- 
mation available in period 1, the theory also requires that the mean of €, be 
equal to zero; evidence that €,,, 1s predictable inval:dates the random walk hy- 
pothesis. Again, the appropriate estimation of a single-equation model is consid- 
ered in Chapters 2 through 4. 


2. Reduced Forms and Structural Equations: Often, it is useful to collapse a 


system of difference equations into separate single-equation models. To illus- 
trate the key issues involved, consider a stochastic version of Samuelson’s 
(1939) classic model: 


ape ey Y=, +i, a.) 
C= Oy, + Eo O0<a<J (1.2) 
i, = Bc, = c) + En B>0 (1.3) 


Time-Series Models 5 


where y,, C, and i, denote real GNP, consumption, and investment in time period 
t, respectively. In this Keynesian model, y, c,, and i, are endogenous variables. 
The previous period’s GNP and consumption, y, and c,_,, are called predeter- 
mined or lagged endogenous variables. The terms €., and €, are zero mean ran- 
dom disturbances for consumption and investment and the coefficients œ and B 
are parameters to be estimated. 

The first equation equates aggregate output (GNP) with the sum of consump- 
tion and investment spending. The second equation asserts that consumption 
spending is proportional to the previous period's income plus a random distur- 
bance term. The third equaiiua illustrates the accelerator principle. Investment 
spending is proportional to the change in consumption; the idea is that growth in 
consumption necessitates new investment spending. The error terms €,, and €, 
represent the portions of consumption and investment not a by the be- 
havioral equations of the model. 

Equation (1.3) is a structural equation since it expresses the endogenous 
variable i, as being dependent on the current realization of another endogenous 
variable c, A reduced-form equation is one expressing the value of a variable 
in terms of its own lags, lags of other endogenous variables, current and past 
values of exogenous variables, and disturbance terms. As formulated, the con- 


‘ sumption function is already in reduced form; current consumption depends 


only on lagged income and the current value of the stochastic disturbance term 
€a Investment is not in reduced form since it depends on current period con- 
sumption. 

To derive a reduced-form equation for investment, substitute (1.2) into the in- 
vestment equation to obtain 


i, = RAY,- + Eo C,-1) +E; 
= ABY, — Bci + Bec, + En 


Notice that the reduced-form equation for investment is not unique. You can 


lag (1.2) one period to obtain c,_, = Qy,.2 + €e- Using this expression, we can 
also write the reduced-form investment equation as 


i, = apy, = Ry,- + Eeri) + Reer X Eir 
= ABO = Yi-2) + Ble g Ec) +E (1.4) 


Similarly, a reduced-form equation for GNP can be obtained by substituting 
(1.2) and (1.4) into (1.1): 


Yr = BY) + Eat OBO fa Yr-2) + Ble, oe Ect) + Ei 


=a(1+ BY- = oBy,2 + (1+ Boe, +E ~ Beg Soa S) 


Equation (1.5) is a univariate reduced-form equation; y, is expressed solely 
as a function of its own lags and disturbance terms. A univariate model is partic- 
ularly useful for forecasting since it enables you to predict a series based solely 


wes 


Difference Equations 


on its own current and past realizations. It is possible to estimate (1.5) using the 
univariate time-series techniques explained in Chapters 2 through 4. Once you 
obtain estimates of o and 8, it is straightforward to use the observed values of y, 
through y, to predict all future values in the series (1-€., Yai Yaz os )s 

Chapter 5 considers the estimation of multivariate models when all variables 
are treated as jointly endogenous. The chapter also discusses the restrictions 
needed to recover (1.e., identify) the structural model from the estimated re- 
duced-form model. 


. Error Correction: Forward and Spot Prices. Certain commodities and finan- 


cial instruments can be bought and sold on the spot market for immediate deliv- 
ery or for delivery at some specified future date. For example, suppose that the 
price of a particular foreign currency on the spot market is s, dollars and the 
price of the currency for delivery one-period into the future is f, dollars. Now, 
consider a speculator who purchased forward currency at the price f, dollars per 
unit. At the beginning of period ¢ + 1,%he speculator receives the currency and 
pays f, dollars per unit received. Since spot foreign exchange can be sold at s,,,, 
the speculator can earn a profit (or loss) of s,,, — f, per unit transacted. 

The unbiased forward rate (UFR) hypothesis asserts that expected profits 
from such speculative behavior should be zero. Formally, the hypothesis posits 
the following relationship between forward and spot exchange rates: 


Sia Ffi t+ Eni (1.6) 


where €,,, has a mean value of zero from the perspective of time period £. 
In (1.6), the forward rate in f is an unbiased estimate of the spot rate int + 1. 
Thus, suppose you collected data on the two rates and estimated the regression: 


Sry = Qo + Oh + Ens 


If you were able to conclude that & = 0, œ, = | and the regression residuals 
€,,, have a mean value of zero from the perspective of time period t, the UFR 
hypothesis could be maintained. 

The spot and forward markets are said to be in “long-run equilibrium” when 
€a = 0. Whenever s,,, tums out to differ from f. some sort of adjustment must 
occur to restore the equilibrium in the subsequent period. Consider the adjust- 
ment process: 


S2 = Sra — OS; — A) + €or a>o ~ (47) 
tir =i Bier =A) + Ear B>O (1.8) 


t+2 


where €,,,. and €,,, both have a mean value of zeto from the perspective of time 
period t+ 1 and r, respectively. 

Equations (1.7) and (1.8) illustrate the type of simultaneous adjustment mech- 
anism considered in Chapter 6. This dynamic model is called an error-correc- 


Difference Equations and Their Solutions 7 


tion model since the movement of the variables in any period is related to the 
previous period’s gap from long-run equilibrium. If the spot rate s,,, tums out to 
equal the forward rate f, (1.7) and (1.8) state that the spot and forward rates are 
expected to remain unchanged. If there is a positive gap between the spot and 
forward rates so that s,,, — f, > 0, (1.7) and (1.8) lead to the prediction that the 
spot rate will fall and the forward rate will rise. 


2. DIFFERENCE EQUATIONS AND THEIR SOLUTIONS 


Although many of the ideas in the previous section were probably familiar to you, 
it is necessary to formalize some of the concepts used. In this section, we will ex- 
amine the type of difference equation used in econometric analysis and make ex- 
plicit what it means to “solve” such equations. To begin our examination of differ- 
ence equations, consider the function y = f(r). If we evaluate the function when the 
independent variable ¢ takes on the specific value t*, we get a specific value for the 
dependent variable called y,.. Formally, y,. = f(t*). If we use this same notation, 
Ysp represents the value of y when t takes on the specific value t* + h. The first 
difference of y is defined to be the value of the function when evaluated at t = 
t* + h minus the value of the function evaluated at 1*: i 


Kt*+h) -R 


= Yih T Ye (1.9) 


Ml 


Ayran 


Differential calculus allows the change in the independent variable (i.e., the term 
h) to approach zero. Since most economic data are collected over discrete periods, 
however, it is more useful to allow the length of the time period to be greater than 
zero. Using difference equations, we normalize units so that h represents a unit 
change in f (i.e., h = 1) and consider the sequence of equally spaced values of the 
independent variable. Without any loss of generality, we can always drop the aster- 
isk on r*. We can then form the first differences: 


Ay, =f) f=) E Yi Ym 
AY =ft+ 1) = fy = Yay 
AVi42 = flt+ 2) ERER 1) = Ya T Yr 


Often, it will be convenient to express the entire sequence of values {+ yy.2. Yt 
Yn Yat Yaz |} as {y,}. We can then refer to any one particular value in the se- 
quence as y, Unless specified, the index f runs from ~e to +eo. In time-series 


econometric models, we will use f to represent “time” and A the length of a time pe- > 


riod. Thus, y, and y,,, might represent the realizations of the {y,} sequence. in the 
first and second quarters of 1995, respectively. 
In the same way, we can form the second difference as the change in the first 


difference. Consider l 


Te vet career meniere 


~= 


mm 


8 Difference Equations 


ii 


AY, A(AY,) = AO; = Y) = Or = YD = ret = Mena) = Ye = Wet H Yr 
A Yai = A(Ay,,1) ae AO nı AS y,) = Ori = ya) i (y, Sg Yii) = Viet T 2y, YA 


The nth difference (4”) is defined analogously. At this point, we risk taking the 
theory of difference equations too far. As you will see. the need to use second dif- 
ferences rarely arises in time-series analysis. It is safe to say that third- and higher- 
order differences are never used in applied work. 

Since this text considers linear time-series methods, it is possible to examine 
only the special case of an nth-order linear difference equation with constant coeffi- 
cients. The form for this special type of difference equation is given by 


n v 
Y = DNN +x, Lg a (O 
i=l ; 


The order of the difference equation is given by the value of n. The equation is lin- 
ear because all values of the dependent variablg are raised to the first power. 


Economic theory may dictate instances in which the various a, are functions of ` 


variables within the economy. However, as long as they do not depend on any of 
the values of y, or x,, we can regard them as parameters, The term x, is called the 
forcing process. The form of the forcing process can be very general; x, can be any 
function of time, current and lagged values of other variables, and/or stochastic dis- 
turbances. By appropriate choice of the forcing process, we can obtain a wide vari- 
ety of important macroeconomic models. Reexamine Equation (1.5), the reduced 
form equation for GNP. This equation is a second-order difference equation since y, 
depends on y,» The forcing process is the expression (1 + Be, + €n ~ Bey. You 
will note that (1.5) has no intercept term corresponding to the expression ay in 
(1.10). 
An important special case for the {x,} sequence is 


He) Bes rigs 


iQ 


where the f; are constants (some of which can equal zero) and the individual ele- 
ments of the sequence {€,] are not functions of the y, At this point, it is useful to 
allow the {€,} sequence to be nothing more than a sequence of unspecified exoge- 
nous variables. For example, let (€,} be a random error term and set Bo = 1 and B, = 
B2 = = = 0, then Equation (1.10) becomes the autoregression equation: 


Yi = Ao F AY) E AY p2 FoF Yn F E 


Let n= l, ag = 0, anda, = | to obtain the random walk model. Notice that Equation 


_ (1.10) can be written in terms of the difference operator (A). Subtracting y,_, from 


(1.10), we obtain 


Cd taba Ms ko eae DA DO ANRE ce AR iae 


Difference Equations and Their Solutibns 9 


š 3 n 
Ye | Ment = Ag +4, — DY + Dai EN 
i=2 


or defining y= (a,- 1), we get 


Ay, = ag + YY -1 + Y ayni +X, REN thio i 
i=2 


Clearly, Equation (1.11) is just a modified version of (1.10). 

A solution to a difference equation expresses the value of y, as a function of the 
elements of the {x,} sequence and t (and possibly some given values of the {y,} se- 
quence called initial conditions), Examining (1.11) makes it clear that there is a 
strong analogy to integral calculus when the problem is to find a primitive function 
from a given derivative. We seek to find the primitive function Kt) given an equa- 
tion expressed in the form of (1.10) or (1.11). Notice that a solution is a function . 
rather than a number. The key property of a solution is that it satisfies the differ- 
ence equation for all permissible values of £ and (x,}. Thus, the substitution of a so- 
lution into the difference equation must result in an identity. For example, consider 
the simple difference equation Ay, = 2 (or y, = Yı + 2). You can easily verify that a 
solution to this difference equation is y, = 2t + c, where c is any arbitrary constant. 
By definition, if 2f + c is a solution, it must hold for all permissible values of ft. 
Thus for period t - 1, yı = 2(t— 1) + c. Now substitute the solution into the differ- 
ence equation to form 


2+e =2t-1)+c+2 (1.12) 


It is straightforward to carry out the algebra and verify that (1.12) is an identity. 
This simple example also illustrates that the solution to a difference equation need 
not be unique; there is a solution for any arbitrary value of c. 

Another useful example is provided by the irregular term shown in Figure 1.1; 
recall that the equation for this expression is /,=0.7/,., +€, You can verify that the 
solution to this first-order equation is 


I, =$ 0e k (1.13) 
i=0 i 


Since (1.13) holds for all time periods, the value of the irregular component in 
t— l is given by o a ie ok 


| : y= 0.7) ei i : e (1.14) 
T : i=0 : : Do 5 


ie 


10 Difference Equations 


Now substitute (1.13) and (1.14) into /, = 0.7/,_, + €, to obtain 


€,+0.7€,_, + (0.7) e3 + (0.7)%€,_5 + 
=0.7[e,_ 


The two sides of (1.15) are identical; this proves that (1.13) is a solution to the 
first-order stochastic difference equation /, = 0.7/,_, + €,. Be aware of the distinction 
between reduced-form equations and solutions. Since /, = 0.7/,_, + €, holds for all 
values of s, it follows that Z, = 0.71,_) + €,_,. Combining these two equations yields 


1, =0.7(0.71,2 + € 1) + & 
=0.49/,,+0.7e,,+6€, © 2: (1.16) 


Equation (1.16) is a reduced-form equation since it expresses J, in terms of its 
own lags and disturbance terms. However, (1.16) does not qualify as a solution 
since it contains the “unknown” value of /,_,. To qualify as a solution, (1.16) must 
express I, in terms of the elements of x, t, and any given initial conditions. 


3. SOLUTION BY ITERATION 


The solution given by (1.13) was simply postulated. The remaining portions of this 
chapter develop the methods you can use to obtain such solutions. Each method has 
its own merits; knowing the most appropriate to use in a particular circumstance is 
a skill that comes only with practice. This section develops the method of iteration. 
Although iteration is the most cumbersome and time-intensive method, most people 
find it to be very intuitive. ` 

If the value of y in some specific period is known, a direct method of solution is 
to iterate forward from that period to obtain the subsequent time path of the entire y 


sequence. Refer to this known value of y as the initial condition or value of y in 


time period 0 (denoted by yp). It is easiest to illustrate the jterative technique using 
the first-order difference equation: 


y,= Ay + ayy, +, OO Ae l aan 
Given the value of yo, it follows that y, will be given by 
Yı HA tA Yo tE, 
In the same way, ya must be ey Us 
Y2 = Got uy, +E 


=o + 4,(A9 + AyYo + E+E 
= Ag + AQ, + (a) yot aE, + Ey 


+ 0.7€,_. + (0.77€, + (0.7), tJ +e, (1.15) 


Solution by Iteration. A 
Continuing the process in order to find y4, we obtain 


Y3 = Ay + Ay2 + €3 
=ad{1 +a, + (a, MECHAN yo + a7e, PaPa 


You can easily verify that for all ¢ > 0, repeated iteration yields - 


ims 


i 1 © i i i ` 
Y, = My 90; Haig + Dae; l (1.18) 


i=0 i=0 


Equation (1.18) is a solution to (1.17) since it expresses y, as a function of ¢, the 
forcing process x, = X(a,)‘e,_;, and the known value of yọ. As an exercise, it is useful 
to show that iteration from y, back to yo yields exactly the formula given by (1.18). 
Since y, = dy + ayy,_; + €n it follows that 


Y, = Ag + Ay(y + Ayy,-2 + Em1) + E, 
= (1 +a) + aE... + €, + aiao + A,y,-3 + Era) 


Continuing the iteration back to period 0 yields Equation (1.18). 


Iteration Without an Initial Condition 


Suppose you were not provided with the initial condition for yo. The solution given 
by (1.18) would not be appropriate since the value of yọ is an unknown. You could 
not select this initial value of y and iterate forward, nor could you iterate backward 
from y, and simply choose to stop at f = fy. Thus, suppose we continued to iterate 
backward by substituting a) + a,y_, + €ọ for yo in (1.18): 


t-l t-l 


i t i 
y, = ag $a; +a, (ag +a Ya TREDI 


i=0 i=0 


1+) i Er 
soJa Dat aE, ita Y; , (1.19) 


I 


Continuing to iterate backward another m periods, we obtain 


{+m f+m i e A 
t4+m+ P 
Y= > 24 + Daa E Yoma 2e je a20 
S i=0 i=0 : ; . 


Now examine the pattern emerging from (1.19) and (1.20). If la,| <-i, the tem ` 
a!*™™* approaches zero as m approaches infinity. Also, the infinite. sum, [] + a, + 


kaammat mat 


Ce t 


Maata oht 


eenreceres 


Mer danter 


anna ant 


te 


Difference Equations 


(ayy +] converges to M ~ a). Thus, if we temporarily. assume that lal <l, 
after continual substitution, (1.20) can be written as 


oo 


Y, = a/l- a+ $ aye t-i Dn; (1.21) 
i=0 


You should take a few minutes to convince yourself that (1.21) is a solution to 
the orginal difference equation (1.17); substitution of (1.21) into (1.17) yields an 
identity. However, (1.21) is not a unique solution. For any arbitrary value of A, a 
solution to (1.17) is given by 


y, = Aa, +a- a )+ aje mi 7 (1.22) 


i=0 
To verify that for any arbitrary value of A, (1.22) is a solution, substitute (1.22) 


into (1.17) to obtain 


ay/(1~a,)+ Aa, +> aE; = dy ta)}ay/(Q1 ~a,)+Aay +D aen +6, 


i=0 i=0 
Since the two sides are identical, (1.22) is necessarily a solution to (1.17). 


Reconciling the Two Iterative Methods 


Given the iterative solution (1.22), suppose that you are now given an initial condi- 


tion concerning the value of y in the arbitrary period fọ. It is straightforward to 


show that we can impose the initial condition on (1.22) to yield the same solution 


as (1.18). Since (1.22) must be valid for all periods (including tọ), then when t = 0, 


it must be true that 


o0 


Ww5AtaNl-a)+ Y ae so that 
1=0 
A= yy -al-a)— > ave. | (1.23) 
. i= 


Since yy) :s given, we can view (1.23) as the value of A that renders (1.22) a solu- 
tion to (1.17) given the initial condition. Hence, the presence of the initial condition 
eliminates the “arbitrariness” of A. Substituting this value of A into (1.22) yields 


ate mre ety 


= 


Solution by Iteration B 


E Yy, = Ye —agl—a,)- dae, ai +agl(l-a,)+ ¥ a6 (1.24) 
$ i=0 i=0 


Simplification of (1.24) results in 


E t-f ` 
Yı = [yp = a/(1- a, Jal +ag/ d-u) + > aje o a25) 
i=0 


You should take a moment to verify that (1.25) is identical to (1.18). 


Nonconvergent Sequences 


Given that |a,| < 1, (1.21) is the limiting value of (1.20) as m grows infinitely 
large. What happens to the solution in other circumstances? If la, > 1, it is not 
possible to move from (1.20) to (1.21) since the expression la, [nm grows infi- 
nitely large as t + m approaches infinity.’ However, if there is an initial condition, 
there is no need to obtain the infinite summation. Simply select the initial condition 
Yo and iterate forward; the result will be (1.18): 


r-i “ted 
i 1 i 
» = ay > a +a, Yo + Y aje, 
i=0 i=0 


- Although the successive values of the {y,} sequence will become progressively 
larger in absolute value, all values in the series will be finite. l 
A very interesting case arises if a, = 1. Rewrite (1.17) as 


Y= gt Ymi tE, 
or 
Ay, = ay + €, 


As you should verify by iterating from y, back to yẹ a solution to this equation is? : 
J . 
¥, = agtt Se; +o (1.26) ` 
i=} 


After a moment's reflection, the fórm of the solution is quite intuitive. In every 
period #, the value of y, changes by. ao + €; units. After t periods, there are £ such 
changes; hence, the total change is tay plus the ¢ values of the {e,} séquence. Notice 


14 Difference Equations 


Figure 1.2 Convergent and nonconvergent sequences. 


v, = 0.9¥,_4+ E, 
1 1 
1 f à 


XF ~0.5y,_4+ £ Y= Yat E 
1 | | 1 T 
0.5 0.8 
0 0.6 
it 
-0.5 0.4 
= 0.2 
i 0 5 10 15 20 25 0 5 10 15 20 25 
(c) {d) i 


100 100 


50 


-100 


Solution by heration 15 


that the solution contains summation of al! disturbances from e, through e,. Thus, 
when a, = 1, each disturbance has a permanent nondecaying effect on the value of 
y, You should compare this result to the solution found in (1.21). For the case in 
which | a, l<1, la, lisa decreasing function of ¢ so that the effects of past distur- 
bances become successively smaller over time. 

The importance of the magnitude of a, is illustrated in Figure 1.2. Twenty-five 
random numbers with a theoretical mean equal to zero were computer-generated 
and denoted oy €, through €,5. Then the value of yọ was set equal to unity and the next 
25 values of the {y,} sequence were constructed using the formula y, = 0.9y,_, + €,. 
The result is shown by the thin line in part (a) of Figure 1.2. If you substitute a, = 0 
and a, = 0.9 into (1.18), you will see that the time path of {y,} consists of two parts. 
The first part, 0.9’, is shown by the slowly decaying thick line in the (a) panel of the 
figure. This term dominates the solution for relatively small values of r. The influ- 
ence of the random part is shown by the difference between the thin and thick lines; 
you can see that the first several values of {€,} are negative. As £ increases, the in- 
fluence of the random component becomes more pronounced. 

Using the previously drawn random numbers, we again set yọ equal to unity and 
a second sequence was constructed using the formula y, = 0.5y,_, + €,. This second 
sequence is shown by the thin line in part (b) of Figure 1.2. The influence of the ex- 
pression 0.5’ is shown by the rapidly decaying thick line. Again, as. increases, the 
random portion of the solution becomes more. dominant in the time path of (y,}. 


When we compare the first two panels, it is clear that reducing the magnitude ` 


of la, | increases the rate of convergence. Moreover, the discrepancies between the 
simulated values of y, and the thick line are less pronounced in the second part. As 
you can see in (1.18), each value of €,; enters the solution for y, with a coefficient 
of (a,). The smaller value of a, means that the past realizations of €,_, have a 
smaller influence of the current value of y,. 

Simulating a third sequence with a, = —-0.5 yields the thin line shown in part (c). 
The oscillations are due to the negative value of a,. The expression (-0.5)’, shown 
by the thick line, is positive when £ is even and negative when ¢ is odd. Since la, | 
< 1, the oscillations are dampened. 

The next three parts of Figure 1.2 all show nonconvergent sequences. Each uses 
the initial condition y, = | and the same 25 values of {e€,} used in the other simula- 
tions. The thin line in part (d) shows the time path of y, = y,_, + €, Since each value 
of €, has an expected value of zero, part (d) illustrates a random walk process. Here, 
Ay, = €, so that the change in y, is random. The nonconvergence is shown by the 
tendency of {y,} to meander. In part (e), the thick line representing the explosive 
expression (1.2) dominates the random portion of the {y,} sequence. Also notice 
that the discrepancy between the simulated {y,} sequence and the thick line widens 
as t increases. The reason is that past values of €,_; enter the solution for y, with the 
coefficient (1.2). As i increases, the importance of these previous discrepancies be- 
comes increasingly significant. Similarly, setting a, = —1.2 results in the exploding 
oscillations shown in the lower-right part of Figure 1.2. The. value (-1.2)' is posi- 
tive for even values of ¢ and negative for odd values of 1. 


we | 
i 
t 
| 
i 
i 
i 
E 


P BEE E 


16 Difference Equations 


4. AN ALTERNATIVE SOLUTION METHODOLOGY 


Solution by the iterative method breaks down in higher-order equations. The alge- :- 
braic complexity quickly overwhelms any reasonable attempt to find a solution, `; 
Fortunately, there are several alternative solution techniques than can be helpful in ©. 
solving the nth-order equation given by (1.10). Using the principle that you should °°; 
learn to walk before you learn to run, we see that it is best to step through the first-.} | 


order equation given by (1.17). Although you will be covering some familiar 


ground, the first-order case illustrates the general methodology extremely well. To Hel 
split the procedure into its component parts, consider only the HOMOP SHEDS por- i: 


tion of (1.17) 


Y5 aY) a27 : 


The solution to this homogeneous equation is called the homogeneous solution; ` 
at times, it will be useful to denote the homogeneous solution by the expression y*. | 


Obviously, the trivial solution y, = y, = + = 0 satisfies (1.27). However, this solu- 
tion is not unique. By setting a, and all values of {€,} equal to zero, (1.18) becomes 
Y, = ayo. Hence, y, = a$ Yo must be a solution to (1.27). However, even this solution 
does not constitute the full set of solutions. It is easy to verify that the expression a 
multiplied by any arbitrary constant A satisfies (1.27). Simply substitute y, = A(a,)’ 
and y,., = A(a,)"! into (1.27) to obtain 


Ala) =a,A(a,)"! 


Since a‘, = a,(a,)"', it follows that y, = A(a,)' solves (1.27). With the aid of the 
thick lines in Figure 1.2, we can classify the properties of the homogeneous solu- 
tion as follows: 


1. If la, | <1, the expression (a,} converges to zero as f approaches infinity. 
Convergence is direct if 0 <a, < 1 and oscillatory if -1 <a, <0. 


2. If la, | >1, the homogeneous solution is not stable. If a, > 1, the homogeneous 


solution approaches infinity as £ increases. If a, < ~1, the homogeneous solution ' 


oscillates explosively. 


3. Ifa, = 1, any arbitrary constant A satisfies the iaeei equation y, = yny. If 
= —1, the system is meta-stable: (a,)' = | for even values of t and ~1 for odd 
ate of t. 


Now consider (1.17) in its entirety. In the last section, you confirmed that (1.21) 
is a valid solution to (1.17). Equation (1.21) is called a particular solution to the 
difference equation; all such particular solutions will be denoted by the term y?. 
The term “particular” stems from the fact that a solution to a difference equation 
may not be unique; hence, (1.21) is just one particular solution out of the many pos- 
sibilities. 

In moving to (1.22), you verified that the particular solution was not unique. The 
homogeneous solution Aaj plus the particular solution given by (1.21) constituted 


An Alternative Solution Methodology 17 


the complete solution to (1.17). The general solution to a difference equation is de- 
fined to be a particular solution plus all homogeneous solutions. Once the general 
solution is obtained, the arbitrary constant A can be eliminated by imposing an ini- 
tial condition for yo. 


The Solution Methodology 


The results of the first-order case are directly applicable to the nth-order equation 
given by (1.10). In this general case, it will be more difficult to find the particular 
solution and there will be z distinct homogeneous solutions. Nevertheless, the sohu- 
tion methodology will always entail the following four steps: 


STEP 1: Form the homogeneous equation and find all n homogeneous solutions. 


STEP 2: Find a particular solution. 


STEP 3: Obtain the general solution as the sum of the particular solution and a lin- 
ear combination of all homogeneous solutions. 


STEP 4: Eliminate the arbitrary constant(s) by imposing the initial condition(s) on 
the general solution. 


Before we address the various techniques that can be used to obtain homoge- 


neous and particular solutions, it is worthwhile to illustrate the methodology using 
the equation: 


V 


y, =0.9y 1- 0.2y 2+3 © (1.28) 


Clearly, this second-order equation is in the form of (1.10) with ay = 3, a, = 0.9, 
@, = —0.2, and x, = 0. Beginning with the first of the four steps, form the homoge- 


- nous equation: 


y, ~ 0.9y,_) +0.2y 2 =0 f (1.29) 

In the first-order case of (1.17), the homogeneous solution was A(a,)’. Section 6 

will show you how to find the complete set of homogeneous solutions. For now, it 
is sufficient to assert that the two homogeneous solutions are y}, = (0.5) and y4, = 


(0.4). To verify the first solution, note that y? = (0. sy and y*,_, = (0.5). Thus, 
Yi is a solution if it satisfies 


(0.5) — 0.9(0.5)"! + 0.20.5)? = 0 
If we divide by (0.5)~*, the issue is whether 


(0.5)? — 0.9(0.5) + 0.2 = 0 


18 Difference Equations 


Carrying out the algebra 0.25 — 0.45 + 0.2 does equal zero so that (0.5)' is a solu- 
tion to (1.29). In the same way, it is easy to verify that y% = (0.4)' is a solution since 


(0.4)' — 0.9(0.4)'"! + 0.2(0.4)"* = 0 


Divide by (0.4)? to obtain (0.4)* — 0.9(0.4) + 0.2 = 0.16 — 0.36 +0.2=0. 

The second step is to obtain a particular solution; you can easily confirm that the 
particular solution y/ = 10 solves (1.28) as 10 = 0.9( 10) — 0.2(10) +3. aa 

The third step is to combine the particular solution and a linear combination of 
both homogeneous solutions to obtain 


y, =A, (0.57 + A,(0.4)' + 10 


where A, and A, are arbitrary constants. l 

For the fourth step, assume you have two initial conditions for the {y,} sequence. 
So that we can keep our numbers reasonably round, suppose that yọ = 13 and y, = 
11.3. Thus, for periods zero and one, our solution must satisfy 


13 =A,+A,+ 10 
11.3 =A,(0.5) + A2(0.4) + 10 


Solving simultanéously for A, and A>, you should find A, = | and A, = 2. Hence, 
the solution is i. 


y, = (0.5) + 2(0.4)' + 10 


Generalizing the Method ; AES 
To show that the method is appliċablė to higher-order equations, chatter. st m: 
mogeneous part of (1.10): t i 


i=l 


As shown in Section 6, there are n homongneous solutions that satisfy (1.30). 
For now, it is sufficient to demonstrate the following proposition: /f y; is a homoge- 
neous solution to (1.30), Ay? is also a solution for any arbitrary constant A. By as- 
sumption, yë solves the homogeneous equation se that 


n 
h h 2 7 a ag g 
y= $ GY : (i ee 
i=] , 


y= Sami . 7 ` tag: 


Eman wiam FET ey 


MALTA o y SE e 


Saige ge VE NEA ARUN PTY TR YAAA LE SAAE ACE” AANA a ENS 
\ 7 AS Y 


: zero. Since y? solves the homogeneous equation, the exp 
: parentheses is zero. Thus, (1.34) is an identity; 
- particular solutions solves (1.10). 


An Alternative Solution Methodology 19 


The expression Ay’ is also a solution if: 


n 


Ay, DA O S 4139) 


i=] 


We know (1.32) is satisfied since 
suppose that there are two se 
by yi and y$, It is straightforward to show that for any two constants A 


linear combination A,y", + A,ys, is also a solution to the homo 
Ayi, + A27% is a solution to (1 .30), it must satisfy 


dividing each term by A yields (1.31), Now 


, and A,, the 
geneous equation. If 


Ayi +A yh = aA yii + A-2) + aA yi2 + Aima) ++ anA Yn + Arn) 


Regrouping terms, we want to know if 


7 n 
(avt T È aasta of at ~ Saat =0 
i=l 


i=l 


Since A,yi, and Ay}, are separate solutions to (1.30), 
parentheses is zero. Hence, the linear combination is ne 
homogeneous equation. This result easily generalizes to 
tions to an nth-order equation. 

Finally, 


each of the expressions in 
cessarily a solution to the 
all n homogeneous solu- 


the use of Step 3 is appropriate since the sum of any particular solution 
and any linear combination of all homogeneous solutions is also-a solution. To 


prove the proposition, substitute the sum of the particular and homogeneous solu- 
tions into (1.10) to obtain 


n 
P hias > p h 
Yı +y, =a) + ay? typ), r (1.33) 


i=l 


Recombining the terms in (1.33), we want to know if 


n n Bae ° 
by =o D ms Jo -Sot On. (1.34) 
i=l : 


i=] 


Since y? solves (1.10), the expression in the first set of parentheses of (1.34) is 
ression in the second set of 


the sum of the homogeneous and 


i 


parate solutions to the homogeneous equation denoted - 


| 
4 
q 
| 
1 
a4 


= 


20 Difference Equations 


5. THE COBWEB MODEL 


An interesting way to illustrate the methodology outlined in the previous section is 
to consider a stochastic version of the traditional cobweb model. Since the model 
was originally developed to explain the volatility in agricultural prices, let the mar- 
ket for a product—say, wheat—be represented by 


d=a~yp,. y>0 (1.35) 
S,=b+PpF +e, B>0 (1.36) 
s, =d, (1.37) 
where d, = demand for wheat in period t 
l s, = supply of wheat in t 


p, = market price of wheat in ? 
price that farmers expect to prevail at f 
a zero mean stochastic supply shock 


¥ 
* 
1l 


m 
{i 


and parameters a, b, y and B are all positive such that a > b. 


The nature of the model is such that consumers buy as much wheat as desired at . 


the market clearing price p, At planting time, farmers do not know the price pre- 
vailing at harvest time; they base their supply decision on the expected price (p*). 
The actual quantity produced depends on the planned quantity b + Bp* plus a ran- 
dom supply shock e,. Once the product is harvested, market equilibrium requires 
that the quantity supplied equals the quantity demanded. Unlike the actual market 
for wheat, the model ignores the possibility of storage. The essence of the cobweb 
model is that farmers form their expectations in a naive fashion; let farmers use last 
year’s price as the expected market price: 
IŽ = Pe (1.38) 
Point E in Figure 1.3 represents the long-run equilibrium price and quantity com- 
bination. Note that the equilibrium concept in this stochastic model differs from 


_ that of the traditional cobweb model. If the system is stable, successive prices will 


tend to converge to point E. However, the nature of the stochastic equilibrium is 
such that the ever-present supply shocks prevent the system from remaining at E. 
Nevertheless, it is useful to solve for the long-run price. If we set all values of the 
{€,} sequence equal to zero, set p, = p; =~ = p, and equate supply and demand, 
the long-run equilibrium price is given by p = (a ~)/(y + B). Similarly, the equilib- 
rium quantity (s) is given by s = (aß + yby/(y + P). 

To understand the dynamics of the system, suppose that farmers in ¢ plan to pro- 
duce the equilibrium quantity s. However, let there be a negative supply shock such 
that the actual quantity produced tums out to be s, As shown by point | in Figure 
1.3, consumers are willing to pay p, for the quantity s,; hence, market equilibrium in 
t occurs at point 1. Updating one period allows us to see the main result of the cob- 
web model. For simplicity, assume that all subsequent values of the supply shock 


ce aistit and ae l Meda E ti 


te AN OT 


a ELL tt alae a BRS GS WM debt 


Sete Ri ABT ee te i : aloo 


The Cobweb Model 21 


are zero (1.€., €; = €,. = = 0). At the beginning of period 1 + 1, farmers expect 
the price at harvest time to be that of the previous period; thus, p*,, = p,. 
Accordingly, they produce and market quantity s,,, (see point 2 in the figure); con- 
sumers, however, are willing to buy quantity s,,, only if the price falls to that indi- 
cated by Pui (see point 3 in the figure). The next period begins with farmers ex- 
pecting to be at point 4. The process continually repeats itself until the equilibrium 
point E is attained. 
As drawn, Figure 1.3 suggests that the market will always converge to the long- . 

run equilibrium point. This result does not hold for all demand and supply curves. 
To formally derive the stability condition, combine (1.35) through (1.38) to obtain 


a b+ Bp-i +e =a ~ Yp, 
or 7 


P= (B/P + (a - byy —e,/y (1.39) 


Clearly, (1.39) is a stochastic first-order linear difference equation with constant 
coefficients. To obtain the general solution, proceed using the four steps listed at 
the end of the last section: 


1, Form the homogeneous equation: p, =.(-B/y)p,_,. In the next section, ‘you will 
learn how to find the solution(s) to a homogeneous equation. For now, it is suffi- 
cient to verify that the homogeneous solution is 


Figure 1.3 The cobweb model, 
Price 


Prat 


Quantity 


' Stat 


22 


. ify that the particular solution for the price is 


Difference Equations 


q t 
p, = A(Bly) 
where A is an arbitrary constant. l SES ei rwg ce 
If the ratio B/y is less than unity, you can erate (1397 een ward {rom Pi to vet 


ved 


pe =la =b +B -UNY WP ei L40) 


i=0 


If B/y 2 1, the infinite summation in (1.40) is not convergent. AS eae in 
the last section, it is necessary to impose an initial condition on (1.40) if B/y 2 1. 


The general solution is the sum of the homogeneous and particular solutions; if 


we combine these two solutions, the gencral solution is 


p, = (a -bW +B) = (Wy) >) (By) e, + APY)! as) 


i=0 


In (1.41), A is an arbitrary constant that can be eliminated if we know the price 


in some initial period. For convenience, let this initial period have a time sub- 


script of zero. Since the solution must hold for every period, including period’ 


zero, it must be the case that 


Z i 0 
Po = (a~ bY +B)-(/V) Y (PB/YY e; + AC BY) 


i=0 


Since (-B/7)° = 1, the value of A is given by 


A= py —(a=byV(y+B)+ (UY) > (By) e 


i=0 
Substituting this solution for A back into (1.41) yields 


2 


p, = (aby +D- Wy) BN ey 
i=0 


+(-B/Y)"| Po = (a -bY + B) + D 


i=0 


Ao Dem ime oe hy 


EE OWT LS EERO METS WH nye ey aR EW AEE Ny 


SOREN E ARES HET ENTE LY EOE NM ERA DSL t 4 Ra 


no mit ec Bie 


The Cobweb Model © 33 


and after simplification of the two summations, 


Pe =(a~ DNY BANS CB e t EN ipaa PHB 1 oy 
i=0 ac: oe i 

We can interpret (1.42) in terms of Figure 1.3. In order to focus 

of the system, temporarily assume that all values of the {e,} 

Subsequently, we will return to a consideration of the effects 


on the stability 
sequence are zero, 
of supply shocks, If 


; remains in long-run 
» Suppose that the process begins at a price below long-run 
equilibrium: py < (a — b)/(y + B). Equation (1.42) tells us that pris 


Pi = (a — by + B) + [po ~ (a ~ b)/(y+ B)] (—Bry)! (1.43) 


Since py < (a — by + B) and ~B/y < 0, it follows that pı will be above the long- ` 
run equilibrium price (a — b)i(y + B). In period 2, 


Pr=(a— by + B) + [po ~ (a = by + BI (-B/y? - 


Although py < (a — DAY + B), (-B/y)? is Positive; hence, p, is below the long-run 
equilibrium. For the subsequent periods, note that (-B/ will be Positive for even 
values of t and negative for odd values of ¢. Just as we found graphically, the suc- 
cessive values of the {p,} sequence will oscillate above and below the long-run 
equilibrium price. Since (B/YY goes to zero if B < y and explodes if B >y, the mag- 


The economic interpretation of this Stability condition is stra 
slope of the supply curve [i.e., dp,/d(s,)] is 1/8 and the absolute value slope of the 


pply shocks, The contemporaneous effect.of a 
supply shock on the price of wheat is the partial derivative of p, with respect to gs 
from (1.42), we obtain om et! 


‘Op,/de, = -17y (1.44) 
Equation (1.44) is called the i 


mpact multiplier since it shows the impact effect 
of a change in e, on the price in 


r. In terms of Figure 1.3, a negative value of e, im- 


ener! 


ee 


Syjon 


ajaaa na 


(asd 


24 Difference Equations 


plies'a price above the long-run price p; the price in f rises by 1/y units for each unit 
decline in current period’s supply. Of course, this terminology is not specific to the 
cobweb model; in terms of the nth-order model given by (1.10), the impact multi- 
plier is the partial derivative of y, with respect to the partial change in the forcing 
process.” 

The effects of the supply shock in persist into future periods. Updating (1.42) 
by one period vields the one-period multiplier: 


OP ,./0€, z -( LEBA 
= Bly’ 

Point 3 in Figure 1.3 illustrates how the price in +1 is affected by the negative 
supply shock in t. It is straightforward to derive the result that the effects of the 
supply shock decay over time. Since B/y < 1, the absolute value of ap,/de, exceeds 
ap,,,/de,. All the multipliers can be derived analogously; updating (1.42) by two pe- 
riods yields: 


ap, .2/d€, = -AB 
and after n periods, 
OP ral BE, = -AARIN 


The time path of all such multipliers is called the impulse response function. 
This function has many important applications in time-series analysis since it 
shows how the entire time path of a variable is affected by a stochastic shock. Here, 
the impulse response function traces out the effects of a supply shock in the wheat. 
market. In other economic applications, you may be interested in the time path ofa 
money supply shock or a productivity shock on real GNP. 

In actuality, the function can be derived without updating (1.42) since it is always 
the case that: 


8p ,.j/0€, = Op,/9€, j 


To find the impulse response function, simply tind the partial derivative of (1.42) 
with respect to the various €,; These partial derivatives are nothing more than the 
coefficients of the {e,_;} sequence in (1.42). 

Each of the three components in (1.42) has a direct economic interpretation. The 
deterministic portion of the particular solution (a — b)/(y + B) is the long-run equi- 
librium price; if the stability condition is met, the {p,} sequence tends to converge 
to this long-run value. The stochastic component of the particular solution captures 


the short-run price adjustments due to the supply shocks. The ultimate decay of the ` 
coefficients of the impulse response function guarantces that the effects of changes 


in the various «, are of a short-run duration. The third component is the expression 


fe et neamen ty 


| M ie , os ie J UA een OT ae INI ia coe 


Solving Homogeneous Difference Equations 25 


ans! TA = (— 

( BNA = ( B/)'Po = (a — b)/(y + B)]. The value of A is the initial period’s devia- 
tion of the price from its long-run equilibrium level. Given that B/y < 1, the impor- 
tance of this initial deviation diminishes over time. 


6. SOLVING HOMOGENEOUS DIFFERENCE EQUATIONS 


Higher-order difference equaiiuns arise quite naturally in economic analysis 
Equation (1.5)—the reduced-form GNP equation resulting from Samko, l 
(1939) model—is an example of a second-order difference equation. Moreover, in 
time-series econometrics, it is quite typical to estimate second- and higher-order 


equations. To begin our examination of homogeneous solutions, consider the 
second-order equation 


Yi T AY mi — ary, =0 (1.45) 


Given the findings in the first-order case, you should suspect that the homoge- 


neous solution has the form y* = Aq! ws aoe hives 
yields yr a. Substitution of this trial solution into (1.45) 


Ag - aAa ' -a AQ? =0 (1.46) 


‘Clearly, any arbitrary value of A is satisfacto ivi 
y ry. If you divide (1.4 ig 
problem is to find the values of æ that satisfy ! ? SARAR 


2 
a*-a,a—-—a,=0 (1.47) 


Solving this quadratic equation—called the characteristic equation—yields two 


values of a, called the characteristic r i i 
` oots. Using the quadratic formula, we 
that the two characteristic roots are : as 


Qj, A, =(a, tya? +4a, J 
=(a,+Jd)/2 ue . (1.48) 


where d is the discriminant [(a,)? + 4a,}. 


R Each of these two characteristic roots yields a valid solution for (1.45). Again, ` 
ese solutions are not unique. In fact, for any two arbitrary constants A, and A, the 


linear combination A (a) + A5(0,)’ i 
2(Q)' also solves (1.45). As proof. i 
yı = A,(a,)' + A% into (1.45) to obtain ae ere ae 


A (Q) + A0) = a [A (œ)! + A0)" '] + a,[A (01)? + A(02)™?] 


Now, regroup terms as follows: 


A [œ - aœ)! > a,(a,)"7] + A,[(a)' - a (0)! — a,(O,)"*} =0 


26 Difference Equations 


Since a, and &, each solve (1.45), both terms in brackets must equal zero. As 
such, the complete homogeneous solution in the second-order case is 


yr =A, (0) + A202)! 
Without knowing the specific values of a, and a), we cannot find the two charac- 
teristic roots @, and a. Nevertheless, it is possible to characterize the nature of the 


solution; there are three possible cases that are dependent on the value of the dis- 
criminant d. 


CASE 1 


If a? + 4a, > 0, d is a real number and there will be two distinct real characteristic 
roots: Hence, there are two separate solutions to the homogeneous equation denoted 
by (a,)' and (œ). We already know that any linear combination of the two is also a 
solution. Hence, 


yr = A, (ay)! + A% 


It should be clear that if the absolute value of either a, or a, exceeds unity, the 
homogeneous solution will explode. Worksheet 1.1 examines two second-order 
equations showing real and distinct characteristic roots. In the first example, y, = 
0.2y,_, + 0.35y,_,, the characteristic roots are shown to be a, = 0.7 and œ = —0.5. 
Hence, the full homogeneous solution is y? = A, (0.7) + A, (0.5). Since both roots 
are less than unity in absolute value, the homogeneous solution is convergent. As 
you can see in the graph on the bottom left-hand side of Worksheet 1.1, conver- 
gence is not monotonic because of the influence of the expression (0.5). 


WORKSHEET 1.1. Homogeneous Solutions: Second-Order Equations 


CASE +: vy) = O.2yy_ 1) + 0.35y,,_ 2. Hence, «#, = 0.2, a, = 0.35. 
Form the homogeneous equation: y, — 0.2¥¢,— 1) — 0-35yy_ 2) = 0. 


A check of the discriminant reveals d = (a,)? + 4 + a), so that d = 1.44. 
Given that d > 0, the roots will be real and distinct. 


Let the trial solution have the form y,, = a’. Substitute into the homoge- 
nous equation a = 0.2 + a! - 0.35 - a? =0. 


Divide by a’? in order to obtain the characteristic equation: 
a — 0.20 - 0.35 = 0 ; 


Compute the two characteristic roots: 


i 


obtain the solution for the two charac 
: Toot is (1.037)', the {y,} sequence ex 
' (Q = -0.337) is responsible for the 


Solving Homogeneous Difference Equations ee 


a = 0.5 - (a, + Vd), 


=0.7 Gy = 0.5 (a, - Vd) 


=-0.5 


The homogeneous solution is A, 0.7'+A, 
-time path of this solution for the case in 
equal unity and ¢ runs from 1 to 20. 


. (-0.5Y. The graph shows the 
which the arbitrary constants 


CASE 2: 


J = OT yy + 0.35y4 - 2). Hence, a = 0.7, a, =U.35. 
Form the homogeneous equation: yp — 0.74 _ 1) = 0.35y, 20 
JU- i = 2; 


A check of the discriminant reveals d = (a) + 4 - 


Given that d > 0, the roots will be real a Pee ONE Sg: 


nd distinct. 


Form the characteristic equation: œ' — 0.7 - a™ ~0.35-a°7 =0 


Compute the two characteristic roots: 


a, = 0.5 - (a, +Vd), 


=0.5+(a,—- 
= 1.037 ME Gea) 


= -0.337 


The homogeneous solution is A, + 1.037'+A 
shows the time path of this solution for the c : 
stants equal unity and f runs from 1 to 20. 


i2 ° (-0.337)'. The graph 
ase in which the arbitrary con- 


Case 1 c 2 
as. 
25 i 
15h 
0.5 
0 10 20 


In the s = 
€ second example, y, = 0.7y,_, + 0.35y,_5. The worksheet indicates how to 


teristic roots. Given that one characteristic 
plodes. The influence of the negative root 
nonmonotonicity of the time path. Since 


(-0.337)' quickly approaches zero, the dominant root is the explosive value 1.037 


EEAS ETER PER ee Re ee RTs eT TE ee NE ee Teme 


sere er 


Apre 


wren net 


Bean er wee tee 


Maree 


aonana 


tr 


Cc E rn 


p> 


kn 


28 Difference Equations 


CASE o o o oou 
ae EE E 


If a? + 4a, = 0, it follows that d= 0 and , = @, = a,/2. Hence, a homogeneous $o- 

lution is a,/2. However, when d = 0, there is a second homogeneous solution given 

by 1(a,/2)'. To demonstrate that y? = t(a,/2)' is a homogeneous solution, substitute it 
1 

into (1.45) to determine whether 


t(a,/2)' -a [t - 1X(a,/2)7'] ~ ak- 2X(a,/2)]=0 


Figure 1.4 The homogeneous solution ! > (a, y. 


0 56 a 60 80 100 
I 
oe $ 
ii. [i 
r e: KDS) 
ATE 
-2 
| j i AL 
zo 20 40 60 80 100 


3 
t 
i 


MRFS, ZSIR SIATE 


Solving Homogeneous Difference Equations 29 


` Divide by (a,/2)*? and form 
—[(a7/4) + a,}t + [(a?/2) + 2a,] =0 


Since we are operating in the circumstance where a? + 4a, = 0, each bracketed 


expression is zero; hence, t(a,/2)' solves (1.45). Again, for arbitrary constants A, 
.and A, the complete homogeneous solution is 


y: =A,(a,/2)' + At(a,/2) 


Clearly, the system is explosive if la, | >2.1f la, | <2, the term A,(a,/2)' con- 
verges, but you might think that the effect of the term #(a,/2)' is ambiguous [since 
the diminishing (a,/2)’ is multiplied by 1]. The ambiguity is correct in the limited 
sense that the behavior of the homogeneous solution is not monotonic. As illus- 
trated in Figure 1.4 for a,/2 = 0.95, 0.9, and -0.9, as long as |a,| < 2, lim[s(a,/2)'] 
is necessarily zero as t —> ©; hence, there is always convergence. For 0 < a, < 2, the 
homogeneous solution appears to explode before ultimately converging to zero, For 
—2 < a, < 0, the behavior is wildly erratic; the homogeneous solution appears to os- 
cillate explosively before the oscillations dampen and finally converge to zero. 


CASE 3 


If a? + 4a, <0, it follows that d is negative so that the characteristic roots are imagi- 
nary. Since a? > 0, imaginary roots can occur only if a, < 0. Although hard to inter- 
pret directly, if we switch to polar coordinates, it is possible to transform the roots 
into more easily understood trigonometric functions. The technical details are pre- 
sented in Appendix 1 of this chapter. For now, write the two characteristic roots as 


a, =(a,+ iN-dy2, a = (a, = iV=d)/2 
where i=V-1 


As shown in Appendix I, you can use de Moivre’s theorem to write the homoge- 
neous solution as 


= Br cos(Or + Ba) (1.49) 


where B, and B, are arbitrary constants, r = (—a,)'”, and the value of 8 is chosen so 
as to simultaneously satisfy 


cos(8) = a,/[2(-a,)'7) . (1.50) 


The trigonometric functions impart a wavelike pattern to the time path of the ho- 


mogeneous solution; note that the frequency of the oscillations is determined by 8. 


b 
s 

g 
& 
oe 

we 
pH 


30 Difference Equations 


Since cos(8s) = cos(2m + 8r), the stability condition is determined solely by the 
magnitude of r = (—a,)'”. If es | = 1, the oscillations are of unchanging amplitude; 
the homogeneous solution is periodic. The oscillations will dampen if E] < land 
explode if la, | >l. 


EXAMPLE: It is worthwhile to work through an exercise using an equation with 
imaginary roots. The left-hand side of Worksheet 1.2 examines the behavior of the 
equation y, = 1.6y,_, — 0.9y,_,. A quick check shows that the discriminant d is nega- 
tive so that the characteristic roots are imaginary. If we transform to polar coordi- 
nates, the value of r is given by (0.9)' = 0.949. From (1.50), cos(@) = 1.6/(2 x 
0.949) = 0.843. You can use a trig table or calculator to show that 0 = 0.567 {i-e., if 
cos(8) = 0.843, @ = 0.567). Thus, the homogeneous solution is 


y? = B,(0.949)' cos(0.567t + B,) (1.51) 


The graph on the left-hand side of Worksheet 1.2 sets B, = | and B, = 0 and plots 
the homogeneous solution for t= 1, ..., 25. Case 2 uses the same value of a, (hence, 


r = 0.949) but sets a, = —0.6. Again, the value of d is negative; however, for this set ` 


of calculations, cos(@) = —0.316 so that @ is 1.25. Comparing the two graphs, you 
can see that increasing the value of 6 acts to increase the frequency of the oscilla- 
tions. 


WORKSHEET 1.2 IMAGINARY ROOTS 


CASE 1 CASE 2 
se 1.6y,_, t 0.9y,_9 y: + 0.6y,, Bs 0.99, 
(a) Check the discriminant d = a? + 4a, 


vow os d= (-1,6)? — 4(0.9) 
= ~1.04 


d = (0.6)? — 4(0.9) 
= -3.24 


Hence, the roots are imaginary. The homogeneous solution has the form 
y? = Bir" cos(Or + Ba) 
where B, and B, are arbitrary constants. 
(b) Obtain the value of r = (~ a)” 


r = (0.972 
= 0.949 


r=.) 
= 0.949 


INAN RNR ER by paepae eperen ameen ree 


Si be eee 


aa 


_ the region below AOB corresponds to Cas 


. largest root be less than unity and the s 
_ characteristic root, Q, =(a,+ Vdy2, will be less than u 


or 


Solving Homogeneous Difference Edens 31 
(c) Obtain 8 from cos(6) = a,/{2(-a,)"7] 


cos(®) = 1.6/[2(0.9)'} 


= 0.843 cos(8) = -0.6/[2(0.9)"2] : 


=~-0.316 
Given cos(8), use a trig table to find 6 


€ ~ 0.567 6 = 1.25 


(d) Form the homogeneous solution: y? = B, cos(6t + Ba) 


AL t 
Y: = B,(0.949y' cos(0.567t+ B) yr = B,(0.949)' cos(1.25r + B,) 


For B, = 1 and B, = 0, the time paths of the homogeneous solution are 
2 2 
0 0 
-2 
1 235 °] 


25 


_ Stability Conditions 


The general stability conditions can be summarized using triangle ABC in Figure 


ne = 408 is the boundary between Cases 1 and 3; it is the locus of points such 
at d =a} + 4a, =0. The region above A0B corresponds to Case 1 (since d > 0) and 
e3 (since d < 0). 
al and distinct), stability requires that the 
mallest root be greater than —1. The largest 
nity if 


In Case 1 (in which the roots are re 


ai + (a? + 4a) <2 or (aj + 4a,)'!? < 2-a, 


Hence, a? + 4a, <4 -4a +a? 


a,+a,<1 (1.52) 


= 
| 

; 
a 

| 
| 
| 

£ 

| 

| 
| 
f 


toda ates 


32 Difference Equations 
The smallest root, @, = (a, ~ Vay2, will be greater than ~1 if 


2 w2 
1/2 : > (aj + 4a 
a- (+ 4a)? >-2: or 24a > (ai + 4a) 

Hence, 4 + 4a, +a; > aj +4a, 
or 


(1.53) 
a,< lta, 


Thus, the region of stability in Case 1 consists of all points in the see ues 
by AOBC For any point in AOBC, conditions (1.52) and A ee P ; p 
; i = 0. The stability condition is |a, ; 
In Case 2 (repeated roots), a," + 4a, . 
eee the er stability in Case 2 consists of all points on arc AOB. In Case 3 
(d <0), the stability condition is r = (-a,)'? < 1. Hence, 


-a< Í (where a, < 0) i 3 (1.54) 


: pg Ti : For 
Thus, the region of stability in Case 3 consists of all points in region AQB. F 
inti is sati dd<0. 
an int in AOB, (1.54) is satisfied an x D , k 
erat way to characterize the stability conditions 1s to state that the ci 
teristic roots must lie within the unit circle. Consider the semicircle drawn in Fig 


i ; ATS eee 
1.6. Real numbers are measured on the horizontal axis and imaginary numbe 


Figure 1.5 Characterizing the stability conditions. 
a2 


Solving Homogeneous Difference Equations 33 


Figure 1.6 Characteristic roots and the unit circle, 
Imaginary 


the vertical axis. If the characteristic roots œ, and O are both real, they can be plot- ; 

ted on the horizontal axis. Stability requires that they lie within a circle of radius 1. 

Complex roots will lie somewhere in the complex plane. If œ, > 0, the roots a, = 

(a, + iNdy2 and œ = (a, — iNdy/2 can be represented by the two points shown in ‘ 

rene 1.6. For example, œ, is drawn by moving a,/2 units along the real axis and 
d 


/2 units along the imaginary axis. Using the distance formula, we can give the 
length of the radius r by 


and using the fact that ? = -1, we obtain 
| r=(-a,)'? 
-The stability condition requires that r < 1. Hence, when plotted on the complex 


plane, the two roots a, and œ, must lie within a circle of radius equal to unity. In” 


the time-series literature, it is simply stated that stability requires that all charac- 
teristic roots lie within the unit circle. 


l . Higher-Order Systems 


The same method can be used to find the homogeneous solution to higher-order 


` difference equations. The homogeneous equation for (1.10) is 


1 


4 Difference Equations 


n 


Yı — oy Qs Yj ery 0 ee (1.55) 


f=] 


Given the results in Section 4, you should suspect each homogeneous solution to 
have the form y? = Aa’, where A is an arbitrary constant. Thus, to find the value(s) 
of a, we seek the solution for 


Aa’ — Yada =0 (1.56) 
i=l eee 


or, dividing through by a”, we seek the values of o that solve 
ao" ao" 2 aor = -a, =0 (1.57) 


This nth-order polynomial will yield n solutions for œ. Denote these n character-. 
istic roots by Oy, O, ..., Œp. Given the results.in Section 4, the linear combination 
A,Q; + A,04+ -- + Aah is also a solution. The arbitrary constants A, through A, 
can be eliminated by imposing n initial conditions on the general solution. The a, 
may be real or complex numbers. Stability requires that all real-valued a, be less 


than unity in absolute value. Complex roots will necessarily come in pairs. Stability: 


requires that all roots lie within the unit circle shown in Figure 1.6. 

In most circumstances, there is little need to directly calculate the characteristic 
roots of higher-order systems. Many of the technical details are included in 
Appendix 2 to this chapter. However, there are some useful rules to check the sta- 
bility conditions in higher-order systems. 


1. In an nth-order equation, a necessary condition for all characteristic roots to lie 
inside the unit circle is 


2. Since the values of the a; can be positive or negative, a sufficient condition for 
all characteristic roots to lie inside the unit circle ts 


Sha < 
i=} 


3, “At Teast one chara istic root equals unity if 


Finding Particular Solutions Jor Deterministic Processes 38 


n 


Ya, =1 


i=} 


Any sequence that contains (0) 


ne or more characteristic roots ity j 
called a unit root process. oe peal Pyas 


4. For a third-order equation, the stability conditions can ve written as 


ogre 8 


1-a,-a,-a,>0 
1+a,-a,+a,>0 
1-a,a,+a,-a2>0 
3+4,+a,~3a,>0 or 3-a,+a,+3a,>0 
Given that the first three ine 


ualities are sati i 
checked. One of the last condi ‘ ae SS sical oe 


tions is redundant given that the other three hold. 


7. FINDING PARTICULAR SOLUTION 
| S 
DETERMINISTIC PROCESSES ii 


Finding the particular solution to a difference e 
ity and perseverance. The appropriate techniqu 


ministic components. Of course, in econometric analysis 
contain both deterministic and stochastic components, 


CASE 1 
a 


x, 0. When all element i 
mes ents of the {x,} process are zero, the difference equation be- 


y= ao + AY i + AY te + Bion 


(1.58) 
fete equen a ety o 
C=Aota CHAC ae 
so that 
E EEE (1.59) 


pehanestecetpeikeee aa i a me byarna Mad 3 a Tee ee 


ai 
i 
| 
1 


Wee 


Manne nae 


36 Difference Equations 


As long as (l = a) — G37 7 Ga) does not equal zero, the value of c given by 
(1.50) is a solution to (1.58). Hence, the particular solution to (1.58) is given by 
yP = all - a, - 427-7 an). 

If 1-a, -a= — a, =0, the value of c in (1.59) is undefined; it is necessary to 


try some other form for the solution. The key insight is that {y,} is a unit root 
process if La; = 1. Since {y,} is not convergent, it stands to reason that the constant 
solution does not work. Instead, recall equations (1.12) and (1.26); these solutions 
suggest that a linear time trend can appear in the solution of a unit root process. As 
such, try the solution y? = cf. For ct to be a solution, it must be the case that 


cl=dytayc(t—l) + a,c(t— 2) +--+ a,c(t — n) 
or combining like terms, we obtain 
(1-a, -az — = 4, CF = Ay ™ CCQ, F 2a, + 3a; + + NA) 


Since | —a; -a= — 4, = 0, select the value of ¢ such that 


c= ay/(a, + 2a + 3a, speak nan) 


For example, let 
y,= 2+ 0.75y,_4 + 0.25y,-2 


Here, a, = 0.75 and a, = 0.25; {y,} is a unit root process since a, + a, = 1. The 
particular solution has the form ct, where c = 2/[0.75 + 2(0.25)] = 1.6. In the event 
that the solution ct fails, sequentially try the solutions y? = cr, cP, = , cf. For an 
nth-order equation, one of these solutions will always be the particular solution. — 


CASE 2 


The Exponential Case. Let x, have the exponential form b(d)", where b, d, andr 
are constants. Since r has the natural interpretation as a growth rate, we would ex- 
pect to encounter this type of forcing process case in a growth context. We illus- 
trate the solution procedure using the first-order equation: 


y= Ao + ayy + bd" a 


To try to gain an intuitive feel for the form of the solution, notice that if b = 0, 
(1.60) is a special case of (1 58). Hence, you should expect a constant to appear in 
the particular solution. Moreover. the expression d” grows at the constant rate r. 
Thus, you might expect the particular solution to have the form y? = Co + c,d", 
where co and c, are constants. If this equation is actually a solution, you should be 


Finding Particular Solutions for Deterministic Processes 37 


able to substitute it back into (1.60) and ( i aer tty faking the a )pro nate 
ybtain an d . i 1 
substitutions, we get 5 


Si gl, ee e.n 


Cot C\d7 = ag + afco + c,d] + bd” (1.61) 
For this solution to “work,” it is necessary to select cy and c, such that 


Co=ay(1-a,) and ce, = bd’K(a’-a,) 


: , ‘Thus, a particular solution is 


i 
i 
b 
i 
: 


y? = [ap/(1 — a,)] + [bd Xd" — a,)}d™ 


Pa a of the cane is that y? equals y constant ag (l — a,) plus an ex- 

ion that grows at the rate r. Note that for įd”| < i i 

a ae as | 1, the particular solution con- 
- either a=l or a; = d’, use the “trick” suggested in Case 1. If a, = 1, try the 

solution co = ct, and if a, =d’, try the solution c, = t(bd)/(d"— a,). Use precisely the 

same methodology in higher-order systems. 


CASE 3 


pe tinie trend. In this case, let the {x,} sequence be represented by the 
relationship x, = bt? where b is a constant and d a positive integer. Hence 


Me d 
Vp = ay + > GY tbt (1.62) 


i=} 


smee y, depends on 1f, it follows that y,ı depends on (t — 1), Y, depends on 
(t— 2)%, etc. As such, the particular solution has the form y? = c rs ie CP +t 
c,7. To find the values of the c,, substitute the particular iion i (1 62) Then 
select the value of each ç that result in an identity. Although various ales of d are 

possible, in economic applications it is common to see models incorporating a lin- 
ear time trend (d = 1). For illustrative purposes, consider the second-order equation 
ARE oa a + bt. Posit the solution y? = cy + cyt, where cy and c, are un- 

ined coefficients. ituti is “ ion” i 

eC Ei ase aS this “challenge solution” into the second- 


Co +C,! = ao + alco + cilt — 1)] + a[o + cil — 2)] + bt (1.63) 


Now select values of co and c, so as to force Equation (1.63) to be an identity for 


all possible values of t. If we combin 
| : e all constant terms i i 
the required values of cg and c, are ce a 


ae 


TA 


men 


38 Difference Equations 


c, = b/(1 — a, — a) 
Co = [dg — (2a, + 4,)c, VI =ü, =a) 


so that 
Co = [a] - a; — a,)} - [bI — a, — a)? [2a + a] 
Thus, the particular solution will als. Cuntain a linear time trend. You should 
have no difficulty foreseeing the solution technique if a, + a, = 1. In this circum- 


stance—which is applicable to higher-order cases also—try multiplying the original 
challenge solution by z. 


8. THE METHOD OF UNDETERMINED COEFFICIENTS 


At this point, it is appropriate to introduce the first of two useful methods of finding 
particular solutions when there are stochastic components ‘in the {y,} process. The 


key insight of the method of undetermined coefficients is that the particular solu-. 


tion to a linear difference equation is necessarily linear. Moreover, the solution can 
depend only on time, a constant, and the elements of the forcing process {x,}. Thus, 
it is often possible to know the exact form of the solution even though the coeffi- 
cients of the solution are unknown. The technique involves positing a solution— 
called a challenge solution—that is a linear function of all terms thought to appear 
in actual solution. The problem becomes one of finding the set of values for these 
undetermined coefficients that solve the difference equation. 

The actual technique for finding the coefficients is straightforward. Substitute the 
challenge solution into the orginal difference equation and solve for the values of 
the undetermined coefficients that yield an identity for all possible values of the in- 
cluded variables. If it is not possible to obtain an identity, the form of the challenge 
solution is incorrect. Try a new trial solution and repeat the process. In fact, we 
used the method of undetermined coefficients when positing the challenge solu- 
tions y? = cy + c,d” and yP = cy + cyt for Cases 2 and 3 in Section 7. 

To begin, reconsider the simple first-order equation Y: = Go + AY 1 + €, Since 


you have solved this equation using the iterative method, the equation is useful for 


illustrating the method of undetermined coefficients. The nature of the {y,} process 
is such that the particular solution can depend only on a constant term, time, and 
the individual elements of the {€,} sequence. Since t does not explicitly appear in 
the forcing process, f can be in the particular solution only if the characteristic root 
is unity. Since the goal is to illustrate the method, posit the challenge solution: 


y, = by t bit Ý ae; (1.64) 
i=0 


where bp, b,, and all the a, are the coefficients to be determined. 


found using the iterative method. Th 
solution plus the homogeneous solution Aa}. 


The Method of Undetermined Coefficients x» 


Substitute (1.64) into the original difference equation to form 


bo + bit + HE +E) + HE + 
=a +a [bo + b(t ~ 1) + Ope, + Qe tee] +e, 


Collecting like terms, we obtain 


(by - ag- aibo+a,b,) +b 0- a)i + (0o — De, + (Gt, = a Oy)e,_, 


+ (0 = 4,0, )e,. + (a, — 4,O))€,34+--=0 (1.65) 


Equation (1.65) must hold for all values 


of t and all possible valu f th 
Sequence. Thus, each of the following condit es of the {e,} 


ions must hold: 


Og ~1=0 
Q, —a)05=0 


O,-4,0,=0 


bo- Gy — ayby + a,b, = 0 
b, ~a,b,=0 s 
a5 
Notice that the first set of conditions can be solved for the'œ, 
lution of the first condition entails setting Op 
next equation requires Q = ay. 


recursively. The so- 
i = 1. Given this solution for Qo, the 
t eq | rec j Moving down the list, we obtain % = a,Q, or 
% = aj. Continuing the recursive process, we find œ, = ai, Now consider the last 


two equations. There are two possible cases depending on the value of a. Ifa, #1 


it immediately follows that b, =0 and b,=a,/(1 — F : 1 
Solution is i 0 = @/(1 — a,). For this case, the particular 


Yr =la- a M+ Ý aie, 
i=0 
Compare this result to (1.21); you will see that it is precisely the same solution 
€ general solution is the sum of this particular 
Hence, the general solution is 


Y, =[ag/(1-a,)}+ Ý afe, ; + Aa! 
i=0 


Now, if there is an initial condition for yo, it follows that 


Yo = [ag/(1-a)}+ Sale +A 
i=0 


| 49 Difference Equations The Method of Undetermined Coefficients 41 


Combining these two equations so as to climinate the arona constant A, we 


| obtain by tott Sa, Ermi me te by thee) + Sa, Eri FE +B Ey 


i=0 i=0 


y, = [ag/* a, y+ y aeni + a yg ~lag/(l-a,)]- Si aie. 
é i=0 


Matching coefficients on all terms containing €,, €p €-2, + yields 
i=0 


O = | 
so that l l l i Q, =a, +B, [so that a, =a, +ß,] 
g! a i l gadis, f° ike j OQ, =a,Q, {so that a, = a,(a, + B) l 
ih ts tb 5 i i =a% [so that a3 = (a,)*(a, + B,)) 
| n donor they =[ap/(1-a,)] +) aie; +a {Yo —[a9/(1-.@,))} ; (1.66) ' ' 
: he SA u i=0 j rans 
, | a 7 O; = aA, [so that a; = (a) a, +B) 
i 2 Jt is easily verified that (1.66) is identica?! to (1.25). Instead, if a, = 1, bg ean be 


any arbitrary constant and b, = dp. The improper form of the solution is Matching coefficients of intercept terms and coefficients of terms covthining A 


we get 


m y= by + aot ED eni 


by = dy + abo — a,b, ne Uy 
g i i=0 


vise 


The form of the solution is “improper” since the sum of the {€,} sequence may 
not be finite. Hence, it is necessary to impose an initial condition. If the value yo is - 
given, it follows that 


kh Again, iea are w cases, If a, ¥ 1, then b, = =0 and by = ao(1 ~ a,). The 
particular solution is 


i 


E s y =lagi (l-a) +e, + (a +B) Y ai 
ay ahas COE OE ppt ae Yo = by + DE i 
pi ae ; $ a ; 

The general solution augments the particular solution with the term Aa’, You are te 
left with the exercise of imposing the initial condition for Yo on the general solution. oM 
Now consider the case in which a, = 1. The undetermined coefficients are such that , 3 
b, = ag and by is an arbitrary constant. The improper form of the solution is . -R 


i .< Imposing the initial condition on the improper form of the solution yields (1.26) 


: hgt Í gooni t 
EE bet. S =y tat Yé; 
RA isl 


Y = by tagt +e, +(1+B)) > € 


: i To take a second example, consider the equation i=l 


i Yi = lo + ayy +E, + Beng (1.67) 


Again, the solution can depend only on a constant, the elements of the {€,} se- 


If Yo is given, it follows that 


quence, and t raised to the first power. As in the previous example, t does not need Yo =bo +e + Ul +B), ej 
i to be included in the challenge solution if the characteristic root differs from unity. is 
| To reinforce this point, use the challenge solution given by (1.64). Substitute t this ` | 


Hence, imposing the initial condition, we obtain 


; noT tentative solution into (1.67) to obtain 
| 


pe Sere 


42 Difference Equations 
t-l 


Wyo tagt +e, +14B,) 6; 


i=l 


Higher-Order Systems 


The identical procedure is used for higher-order systems, Ag an example, let us find 
the particular solution to the second-order equation: > 


21 = otaya taAa + €, 1.68) 
Since we have a second-order equation, we use the challenge solution: 
Y, = bo + bit + bo? + Ape, + OE) + Ey te 


where bo, b,, ba, and the a, are the undetermined coefficients. 
Substituting the challenge solution into (1.68) yields 


2 
(by + byt + byt”) + Ope, + QE, + OE,» + Sag + a [bo+ b(t- 1) + ba(t -1) 
+ AE, + LE» + OE3 ++] + albo + by (t — 2) + bt- 2) 
+ Op€s_2 + AEn + ME, 4 +] +e, 


The necessary and sufficient conditions for the values of the @,’s to render the 
equation above an identity for all possible realizations of the {€,} sequence are 


% = 1 
Q; =a, [so that a, =a,] 
Gy = 4,0, + a09 [so that œ = (a,)? + a3] 


Q3 = 4,0 + A0, [so that a = (a,)? + 2a,a,] 


Norice that for any value of j 2 2, the coefficients solve the second-order differ- l 
ence equation O, = a,0,_, + a02. Since we know 0 and a,, we can solve for all i 
the a, iteratively. The properties of the coefficients will be precisely those dis- 

pi 


cussed when considering homogeneous solutions, namely the following: 


1. Convergence necessitates la,| <l,a,+a,< l, anda, — a < 1. Notice that con- 
vergence implies that past values of the {€,} sequence ultimately have a succes- : 


sively smaller influence on the current value of y,. 


2. If the coefficients converge, convergence will be direct if (a? £ 4a,) > 0, will 
follow a sine/cosine pattern if (aj + 4a,) < 0, and will “explode” and then con- 


verge if (aî + 4a,) = 0. Appropriately setting the œ, we are left with the remain- 


ing expression: 


The Method of Undetermined Coeffitients 43 


bl =a,- a)? + [b,(1 =a, =a) + 2b,(a, + 2a,)]t = 
+ [o(1 ~ a — a) — ao + a,(b, = b,) + 2a,(b, -2b,)} =0 (1.69) 


Equation (1.69) must equal zero for all values of t. First, consider the case in 
which a, + a, ¥ 1. Since (1 — a, — a) does not vanish, it is necessary to set the 
value of b, equal to zero. Given that b, = 0 and the coefficient of t must equal 
zero, it follows that b, must also be set equal to zero. Finally, given that 
bı =b, = 0, we must set bo =af(l - a, — a2). instead, if a, + a = |, the solu- 
tions for the b, depend on the specific values of 4, @,, and a,. The key point is 


in the particular solution. The order of the polynomial is the number of unitary 
characteristic roots. This result generalizes to higher-order equations, 


method of undetermined coefficients to find the stochastic portion of the particular 
solution. In (1.67), for example, set e, = e, = 0 and obtain the solution a(l- a,). 
Now use the method of undeterminéd coefficients to find the particular solution of 


Y= Uy, + €, + Bie. Add together the deterministic and stochastic components 
to obtain all components of the particular solution. 


A Solved Problem 


To illustrate the methodology using a second-order equation, augment (1.28) with 
the stochastic term £, SO that 


¥=3+0.9y,,- 0.2y,_. + €, (1.70) 


You have already verified that the two homogeneous solutions are A,(0.5)' and 
A,(0.4)' and the deterministic portion of the Particular solution is y? = 10. To find 
the stochastic portion of the Particular solution, form the challenge solution: 


i 


il 
3 


“work,” it must Satisfy 


o, + E, + 2€,-2 + Oates = 0.9(dp€,_, tAE + QE, 3 + QE + as ) 


= 0.2faye,_. + OE ,3.+ OE 4 + QE, 5 +o ] +e, (1.71) 


oat el e rata m 


Stasi eee ee AAE e aain. 


EDP NEE TTE ona 


44 Difference Equations 


Since (1.71) must hold for all possible realizations of €p» €,_), €- * , each of the 
following conditions must hold: 


Oy = | 
a, = 0.909 
so that a, = 0.9, and for all i 2 2, 
a, = 0.9a;_, — 0.20;2 (1.72) 


Now, it is possible to solve (1.72) iteratively so that 04 = 0.30, = 0.205 = 0.61, 
a, = 0.9(0.61) — 0.2(0.9) = 0.369, etc. A more elegant solution method is to view 
(1.72) as a second-order difference equation in the a, with initial conditions Op = | 
and q, = 0.9. The solution to (1.72) is 


a; = 5(0.5)' — 4(0.4)' (1.73) 


To obtain (1.73), note that the solution to (1.72) is a =A3(0.5)' + A,{0.4)', where 


A, and A, are arbitrary constants. Imposing the conditions @o =1 aud lie S 
yields (1.73). If we use (1.73), it follows that Gp = 5(0.5)° — 4(0.4)° = 1; a, = 
5(0.5)' — 4(0.4)! = 0.9; a, = 5(0.5)? - 4(0.4)? = 0.61, ete. F 

The general solution to (1.70) is the sum of the two homogeneous solutions an 
the deterministic and stochastic portions of the particular solution: 


y, =10+A,(0.5)' + A,(0.4)' +9 ajeni (1.74) 
i=0 


where the œ, are given by (1.73). , 
Given initial conditions for yp and y,, it follows that A, and A, must satisfy 


Yo =1O+A, +A, +a; Aer) 
i=0 7 : : 
Yı = 10+ A, (0.5) + A, (0.4)+ È a); (1.76) 
i=0 


Although the algebra becomes messy, (1.75) and (1.76) can be substituted into 


(1.74) to eliminate the arbitrary constants: 


1-2 


, =10+ (0.4Y'[5(%9 — 10) = 107; = 10)} + (0.5) 10; = 10) = 4 ~ 101+ $ ,€,.; l 


i=0 


Lag Operators 45 
9. LAG OPERATORS 


If it is not important to know the actual values of the coefficients appearing in the 
particular solution, it is often more convenient to use lag operators than the method 


of undetermined coefficients. The lag operator L is defined to be a linear operator 
such that for any value y, 


Ee = Yj (1.77) 


Thus, L' preceding y, simply means to lag y, by i periods. It is useful to remember 
the following properties of lag operators: 


1. The lag of a constant is a constant: Le = c. 


2. The distributive law holds for lag operators. We can set (L' + Ly, = L'y, + Ly, = 
Ymi t Vij ` 

3. The associative law of multiplication holds for lag operators. We can set LiL/y, = 
LiWy,) = Ly, = y,;~ Similarly, we can set LiL’y, = LY t = Ymi; Note that 
Dy, =y, 

4. L raised to a negative power is actually a lead operator: Ly, = y,,;. To explain, 
define j = —i and form LY, = y,_; =, . 

5. For |a| < 1, the infinite sum (1 + aL + @L? + @L? + ~)y, = y,(1 — aL). This 

property of lag operators may not seem intuitive, but it follows directly from 
properties 2 and 3 above. 
Proof: Multiply each side by (1 — aL) to form (1 — aL) l +aL+ aL? + 
@L? + ---)y, = y Multiply the two expressions to obtain (1 — aL + aL — @ L? + 
PL = aL? + --)y, = yp Given that |a| <1, the expression a"L”y, converges to 
zero as n approaches infinity. Thus, the two sides of the equation are equal. 

6. For lal > 1, the infinite sum [1 + (aZ)"! + (aL)? + (aL)? + Jy, = -aLy,/ 
(1 - aL). 


Hence, y,/(I-aL) =-(aL)"' $ (aL)“y, 
i=0 
Proof: Multiply by (1 — aL) to form (1 — aL)[] + (aL)! + (aL)? + (aLy3 + J] 
| Y, = ~aLy,. Perform the indicated multiplication to obtain: [1 - aL + (aL)! -1 + 
(aL)? — (aL)* + (aL)> — (aL) ---Jy, = -aLy, Given that |a| > 1, the expres- 
sion a“"L“y, converges to zero as n approaches infinity. Thus, the two sides of 
the equation are equal. j 


Lag operators.provide a concise notation for writing difference equations. Using 


lag operators, we can write the pth-order equation y, = ag + ayn; ++ + Any,» tE, 
as 


(l-a,L—a,L? --- — a,L? yy, = ag + & 


m 


vd 


16 Difference Equations 


or more compactly as 
A(L)y, = dy t+ & 


where A(L) is the polynomial (1 — a,L — aL? - += —a,L”). Veg Ge : a: 
Since A(L) can be viewed as a polynomial in the lag operator; the notation A( 
used to denote the sum of the coefficients: 
AQ) = 1-4, >a, a, 
As a second example, lag operators can be used to express: the equation y= ao* 


AY Fo + AYp + & + Bini +--+ Beg as 


A(L)y, = ao + B(L)e, 


i ders p and q, respectively. mE 
where A(L) and B(L) are polynomials of or | 
It is straightforward to use lag operators to solve linear sae ae 
Again, consider the first-order equation y, = ay + ayı + €p where jai <1. Use 
the definition of L to form 


Y, Hay +a, Ly, + €, (1.78) | 
Solving for y,, we obtain | 
Y= (ao +E = aL) (1.79) 


i 2 

From property 1, we know that Lao = dp, so that aọ/(1 — a L) = ag + aiao + gido + 
-= ag (1 — a,). From property 5, we know that e,/(1 — a,L) =6 + aE + AyE,-2 + 
.. , Combining these two parts of the solution, we obtain the particular solution 
given by (1.21). l 

For practice, we can use lag operators to solve (1.67): y, = ao + aiYı-ı + €, oe 
where la, | <1. Use property 2 to form (1 —a,L)y, = ao + (1 + B,L)e,. Solving for 
y, yields 


y, = [ag + + BLe, Vl -= aL) 
so that 
y, = [a40 -apl + lef = a, L) + [Bie - a, 2)] (1.80) 


Expanding the last two terms of (1.80) gives the same solution found using the 
ethod of undetermined coefficients. = 
i Now suppose y, = ao + @)),-1 + €, but that la, | > 1. The application of AE 
to (1.79) is inappropriate since it implies that y, is infinite. Instead, expand (1.79) 


using property 6: 


Lag Operators 47 


» =la l-a] aL Y a LY e; Bog i as iis 


i=0 


= [a9/(1—a,)]—(1/a,) 9° (q Ly'e 


i=) 


t+} 


=lag/(l-a,)]-(1/a,) Y a'era aay 
i= 


Lag Operators in Higher-Order Systems 


We can also use lag operators to transform the nth-order equation Y = do t AY + 
AY +++ + ,Y,_, + €, into 


(l-a,L-a,l?-.. ~a,L")y, = ay + €, 


or 


Y= (ay + eM ~a, L- al? ~«.. a L”) 


- From our previous analysis (also see Appendix 2 in this chapter), we know that 
the stability condition is such that the characteristic roots of the equation a” — 
a,a""! — ...~@, =O all lie within the unit circle. Notice that the values of a solving 
the characteristic equation are the reciprocals of the values of L that solve the equa- 
tion | —a,L ---—a,L" = 0. In fact, the expression 1 ~ a,L --—a,L" is then called the 
inverse characteristic equation. Thus, in the literature, it is often stated that the sta- 
bility condition is for the characteristic roots of (1 -a,L = — aL”) to lie outside of 
the unit circle. 

In principle, one could use lag operators to actually obtain the coefficients of the 
particular solution. To illustrate using the second-order case, consider y, = (ay + €,)/ 
(1 - a,L ~ aL’). If we knew the factors of the quadratic equation were such that (1 
~a,L - a,l?) = (1 — b,L)(1 — bL), we could write 


Y, = (ao + € ICL — BLY - bL) 


If both b, 


and b, are less than unity in absolute value, we can apply property 5 to 
obtain 


Lagi- b)+ > bie, 


» = i=0 
1-b,L 


beauty of lag operators is that the 
succinctly. The general model 


48 Difference Equations 


Reapply the rule to a/( - b,) and to each of the elements in the summation 


Ebi g; to obtain the particular solution. If you want to know the actual ian 
N . . = 
of the process, Itis preferable to use the methed of undetermined coefficients. The 


y ean be used to denote such particular solutions 


A(L)y, = dy + B(LYe, 


has the particular solution: 


y = aglA(L) + B(L)e,/A(L) 


10. FORWARD - VERSUS BACKWARD-LOOKING 
SOLUTIONS 


As suggested by (1.82), there is a forward-looking solution to any mye ee. 
equation. The text will not make much use of the forward-looking solutio 

future realizations of stochastic variables are not directly observable. o 
knowing how tọ obtain forward-looking solutions is useful for solving ilk - 
pectations models. Let us return to the simple iterative technique to eae Jer ie 
forward-looking solution to the first-order equation y, = ao + Qy-1 + Er Solving for 
yı we obtain 


Yi = (ao + ENA + yda 2 see D (1.83) 
Updating one period yields 
y, = (do + Ee Mey + Yel (1.84) 
Since yp = Q1.2 7 % 7 €,,./a,. begin iterating forward: 
2 
y, = 7 (ay + E/G + Vien 7 ta 7 €,,2)/(ay) 


= — (dy + Ere ley ~ (Go + Enda + VpK Y : 
= — (Ay + Epp lay Z (ly + €, (ay) + Opa 7 Fo 7 Ea Ma) 


Therefore, after n iterations, 


-i 


al i n (1.85) 
y= DG T Erri + Man! 0 


i=l i=t 


If we-mpaintain: that \a, | <1, this forward-locking solution will diverge as n be- - 
a comes. infinhely large. However. if la, 


| > 1, the expression aj" goes to zero while 


Forward- Versus Buckward- Looking Solutions 49 


—ao(aj! + aj? + aj? +») converges to ay/(1 — a,). Hence, we can write the forward- 
looking particular solution for y, as 


on 


y, = agl(1-a,)- S a E (1.86) 


i=l 


Note that (1.86) is similar in form to (1.82); the difference is that the future val- 
ues of the disturbances affect the present. Clearly, if la, | > 1, the summation is 
convergent so that (1.86) is a legitimate particular solution to the difference equa- 
tion. Given an initial condition, a stochastic difference equation will have a for- 
ward- and backward-looking solution. For example, using lag operators, we can 
write the particular solution to y, = ag + @,y,_; + €, a (ay + €,/(1 — a, L). Now multi- 
ply the numerator and denominator by —a7'L"' to form 


yy = AoE -a= aP L'e -aq L’) 


si EERE 
=a/(1-a,)- $a; Ei e (1.87) 
i=l 


More generally, we can always obtain-a forward-looking solution for any nth-order 
equation. (For practice in using the alternative methods of solving difference equa- 


tions, try to obtain this forward-looking solution using the method of undetermined 
coefficients.) 


\ 
Properties of the Alternative Solutions 


The backward- and forward-looking solutions are two mathematically valid solu- 
tions to any n.th order difference equation. In fact, since the equation itself is linear, 
it is straightforward to show that any linear combination of the forward- and back- 
ward-looking solutions is also a solution. For economic analysis, however, the dis- 
tinction is important since the time paths implied by these alternative solutions are 


quite different. First consider the backward-looking solution. If la, | <1, the ex- - 


pression ai, converges toward zero as i — œ. Also, notice that the effect of e; on y, 
is a’; if |a, | <1, the effects of the past €, also diminish over time. Suppose instead 
that |a, | > 1; in this instance, the backward-looking solution for y, explodes. 

The situation is reversed using the forward solution. Here, if |a, es l, the ex- 
pression aj’ gets infinitely large as ií approaches œœ, Instead, if la, | > I, the for- 
ward-looking solution leads to a finite sequence for {y,}. The reason is that a; con- 
verges to zero as i increases. Note that the effect of €; on y, is ay’; if la, | > I, the 
effects of the future values of €,,,; have a diminishing influence on the current value 
of y,. i l 

From a purely mathematical point of view, there is no “most appropriate” solu- 
tion. However, economic theory may suggest that a:sequence be bounded in the 
sense that the limiting value for any value in the sequence is finite. Real interest 


rai 


AEE See a y 


ite 


50 Difference Equations 


rates, real per capita income, and many other economic variables can hardly be ex- 
pected to approach either plus or minus infinity. Imposing boundary restrictions en- 
tails using the backward-looking solution if la, | < 1 and the forward-looking solu- 
tion if la, | > 1. Similar remarks hold for higher-order equations. 


Cagan’s Money Demand Function 


Cagan’s model of hyperinflation provides an excellent example of choosing the ap- 
propriateness of forward- versus backward-looking solutions. Let the demand for 
money take the form 


m,— Pp, = Q- BT, ~ Pp) B>0 k k (1.88) 
where m, = logarithm of the nominal money supply int 
p, = the logarithm of price level in t 
pi, = the logarithm of the price level expected in period t+ | 


The key point of the model is that the demand for real money balances (m, — p) ` 


is negatively related to the expected rate of inflation (pf, — p,). Because Cagan was 
interested in the relationship between inflation and money demand, all other vari- 
ables were subsumed into the constant a. Since our task is to work with forward- 
looking solutions, let the money supply function simply be the process 


m,=m+é, 


where m = the average value of the money supply 
e = a disturbance term with a mean value of zero 


t 


AS opposed to the cobweb model, let individuals have forward-looking perfect 
foresight so the expected price for r+ | equals the price that actually prevails: 


Pian = Prat 
Under perfect foresight, agents in period ¢ are assumed to know the price level in 
t+ l. In the context of the example, agents are able to solve difference equations 


and can simply “figure out” the time path of prices. Thus, we can wnte the money 
market equilibrium condition as 


m+ €, — p, =a- Bip. -p 
or 
Par (1+ UB)p, = Am — aB- €,/B (1.89) 


For practice, we use the method of undetermined coefficients to obtain the par- 
ticular solution. (You should check your abilities by repeating the exercise using 


Forward- Versus Backward-Looking Solutions 51 


lag operators.) We use the forward-looking solution since the coefficient on 


(1 + 1/B) is greater than unity in absolute value. Try the challenge solution: 


P 
Pi =b DIOR 
i=0 


Substituting this challenge solution inw ine above, we obtain 


bo DINN -( +BY ‘Že (a-m—e,)/B (1.90) 
i=0 


i=0 


For (1.90) to be an identity for all possible realizations of 


€,}, it must 
case that . {€,} be the 


by — bo (1 + PVB =(a-myp >b=m-a 
—Go(1 + BB = -1/8 => Oy = 1/1 +B) 
% -a(l + BVB =0 = a; = B/C + By 


a; — &.,(1 + BYB=0 => a, = B1 + By! 


In compact form, the particular solution can be written as 


Pr = m~o+ (VB) >'(B/ (+B) en l 0.91) 


i=0 


The next step is to find the homogeneous solution. Form the homogeneous equa- 


tion, Pı — (1 + 1/B)p, = 0. For any arbitrary constant A, it is easy to verify that the 
solution is 


pr =A(1 + By 


Hence, the general solution is 


Pp, =m-OA+ (vB) >” [B/C +B]! e,,; +A + IBY (1,92) 
i=0 . 


If you examine (1 92) closely, you will note that-the impulse response function is 
convergent; the expression [B/(1 + B)]}'* converges to zero as i approaches infinity. 


ERS i te bial Ma KEE ee ee So 
wey 


De rear oe a ea 


52 Difference Equations 


However, the homogeneous portion of the solution is divergent. For (1.92) to yield 
a nonexplosive price sequence, we must be able to set the arbitrary constant equal 
to zero. To understand the economic implication of setting A = 0, suppose that the 
initial condition is such that the price level in period zero is pọ- If we impose this 
initial condition, (1.92) becomes 


py =m-at CIM +B e, +A 
i=0 
Solving før A yields 
A=p+a-m- TOI +BY e; 
i= 
Thus, the initial condition must be such that 


A=0 o py=m-o +(UB)YIBIA+ Be, a 


i=0 


Examine the three separate components of (1.92). The deterministic expression - 


m — a is the same type of long-run “equilibrium” condition encountered on several 
other occasions; a stable sequence tends to converge toward the deterministic por- 
tion of its particular solution. The second component of the particular solution con- 
sists of the short-run responses induced by the various €, shocks. These movements 
are necessarily of a short-term duration because the coefficients of the impulse re- 
sponse function must decay. The point is that the particular solution captures the 
overall long-run and short-run equilibrium behavior of the system. Finally, the ho- 
mogeneous solution can be viewed as a measure of disequilibrium in the initial pe- 
riod. Since (1.91) is the overall equilibrium solution for period 4, it should be clear 
that the value of po in (1.93) is the equilibrium value of the price for period zero. 
After all, (1.93) is nothing more than (1.91) with the time subscript lagged t peri- 
ods. Thus, the expression A(1 + 1/B)' must be zero if the deviation from equilib- 
rium in the initial period is zero. 

Imposing the requirement that the {p,} sequence be bounded necessitates that the 
general solution be 


p, =m 0+ UPPBAR] es 


i=0 


i 


Notice that the price in each and every period t is proportional to the mean value’ 


of the money supply; this point is easy to verify since all variables are expressed in 


Summary and Conclusions 53 


logarithms and ðp/ðm = 1. Temporary changes in the money supply behave in an 
interesting fashion. The impulse response function indicates that future increases in 
the money supply, represented by the various €,,,, serve to increase the price level 
in the current period. The idea is that future money supply increases imply higher 
prices in the future. Forward-looking agents reduce their current money holdings, 


with a consequent increase in the current price level, in response to this anticipated 
inflation. 


SUMMARY AND CONCLUSIONS 


Time-series econometrics is concerned with the estimation of difference equations 


‘containing stochastic components. Originally, time-series models were used for 


forecasting. Uncovering the dynamic path of a series improves. forecasts since the 
predictable components of the series can be extrapolated into the future. The grow- 
ing interest in economic dynamics has given a new emphasis to time-series econo- 
metrics. Stochastic difference equations arise quite naturally from dynamic eco- 
nomic models. Appropriately estimated equations can be used for the interpretation 
of economic data and for hypothesis testing. 

This introductory chapter focused on methods of “solving” stochastic difference 
equations. Although iteration can be useful, it is impractical in many-circumstances. 
The solution to a linear difference equation can be divided into two parts: a particu- 
lar solution and homogeneous solution. One complicating factor is that the homo- 
geneous solution is not unique. The general solution is a linear combination of the 
particular solution and all homogeneous solutions. Imposing n initial conditions on 
the general solution of an mth-order equation yields a unique solution. 

The homogeneous portion of a difference equation is a measure of the “disequi- 
librium” in the initial period(s). The homogeneous equation is especially important 


_in that it yields the characteristic roots; an nth-order equation has n such character- 


istic roots. If all the characteristic roots lie within the unit circle, the series will be 
convergent. As you will see in Chapter 2, there is a direct relationship between the 
stability conditions and the issue of whether an economic variable is stationary or 
nonstationary. 

The method of undetermined coefficients and use of lag operators are powerful 
tools for obtaining the particular solution. The particular solution will be a linear 
function of the current and past values of the forcing process. In addition, this solu- 
tion may contain an intercept term and a.polynomial function of time. Unit roots 
and characteristic roots outside of the unit circle require the imposition of an initial 


.condition for the particular solution to be meaningful. Some economic models al- 


low for forward-looking solutions; in such circumstances, anticipated future events 
have consequences for the present period. 


The tools developed in this chapter are aimed at paving the way for the study of 


time-series econometrics. It is a good idea to work all the exercises presented be- 


low. Characteristic roots, the method of undetermined coefficients, and lag opera- ` 


tors will be encountered throughout the remainder of the text. 


ET 


CPOE 


54 Difference Equations 


QUESTIONS AND EXERCISES = 0 


= 1. Consider the difference equation y, = a) + 4,y,., with the initial condition Yo. 


Jill solved the difference equation by iterating backward: 


Yi = Ag + AY 
= Ag + Uho + 4,2) 
= dy + Goa, + apa? Ho + Aa + ao 


Bill added the homogeneous and particular solutions to obtain y, = ap/(1 - a,) + 
a'[¥o -a(l ~ a,)]. 
A. Show that the two solutions are identical for la, | <1. 


B. Show that for a, = 1, Jill's solution is equivalent to y, = dof + Yo. How 
would you use Bill's method to arrive at this same conclusion in the case 
a,=1? 


2. The cobweb model in Section 5 assumed static price expectations. Consider an 
alternative formulation called adaptive expectations. Let the expected price in t 
(denoted by p*) be a weighted average of the price in ¢ — 1 and the price expec- 
tation of the previous period. Formally, 

p* =op,.+(-—@)pe,, 0<asl 
Clearly, when o = 1, the static and adaptive expectations schemes are equiv- 
alent. An interesting feature of this model is that it can be viewed as a differ- 
ence equation expressing the expected price as a function of its own lagged 

value and the forcing variable p,_,. 


A. Find the homogeneous solution for p* 


B. Use lag operators to find the particular solution. Check your answer by sub- 
stituting your answer in the original difference equation. 


3. Suppose that the money supply process has the form m, =m + pm, +€, where : 


mis aconstant andO<p< 1. 


A. Show that it is possible to express M,» in terms of the known value m, and 
the sequence {Enp Enz o + Eran) 


B. Suppose that all values of €,,; for? > 0 have a mean value of zero. Explain 
how you could use your result in part A to forecast the money supply 7 pe- 
riods into the future. 


4. Find the particular solutions for each of the following: 


i y= Gyr + E + Bien 


Questions and Exercises 55 


ii. F aY FE + Bez (Hint: The form öf the solution is y, = Ece + 
E2) l l l E 


5. The unit root problem in time-series econometrics is concerned wit 


Th h character- 
istic roots that are equal to unity. In order to preview the issue: 


A. Find the homogeneous solution to each of the following: ( 


i Hint: Each has at 
- least one unit root.` l 


1. Y, = ag + Biya z 0.5y,_ TE, il. Y= Uy + Y2 +E, 
I. y, = do + 2Y, 1 — Ya +E, IV. Y= do +y, + 0.25y,.5 — 0.25y,_, 
TE, 


B. Show that each of the backward-looking solutions is not convergent. 


- Show that Equation i can be written entirely in first differences; that is, 
Ay, = ao + 0.5Ay,; + €,. Find the particular solution for Ay,. (Hint: Define 


y* = Ay, so that y* = ay — O.5y*, + €, Find the particular solution for y* in 
terms of the {€,} sequence.) i 


- Similarly transform the other equations into their first-difference form. Find 
the backward-looking particular solution, if it exists, for the transformed 
equations. 


E. Given the initial condition yo, find the solution for y, = ao~ Yy; +E 
yt, 


6. A researcher estimated the following relationship for the inflation rate (f,):- 


T, = —0.05 + 0.77,_, + 0.6%,_, + €, 


A. Suppose that in periods 0 and 1, the inflation rate was 10 and 11%, respec- 


tively. Find the homogeneous, particular, and general solutions for the in- 
flation rate. 


: Discuss the shape of the impulse response function. Given that the United 
States is not headed for runaway inflation, why do you believe that the re- 
searcher’s equation is poorly estimated? 


7... Consider the stochastic Process y, = Ay + aY, + €n 


A. Find the homogeneous solution and determine the stability condition. 


B. Find the particular solution using the method of undetermined coefficients. 


Reconsider the Cagan demand for money function in which m, — p, = a — 
Bipi Spi): 


A. Show that the backward-looking particular solution for p, is divergent. 


: 
; 
n 
4 
; 


56 


Difference Equations 


B. Obtain the forward-looking particular solution for p, in terms of the {m,} 
sequence. In forming the general solution, why is it necessary to assume 
that the money market is in long-run equilibrium? 


C. Find the impact multiplier. How does an increase in m, affect p? Provide 
an intuitive explanation of the shape of the entire impulse response func- 


tion. 


For each of the following, verify that the posited solution satisfies the differ- 
ence equation. The symbols €, cy, and aa denote constants: 


Equation Solution 
A. Y-Y HO ae 
B. Y= Y1 = ao Y= C+ aol 
Cc. Y-Y =O nea y,=¢ + colly 
D. YT Y27F & Y, =C + colly + E, + Enz t Ea te 


10. Part 1: For each of the following, determine whether {y,} represents a stable 


process. Determine whether the characteristic roots are real or imaginary and 
the real parts are positive or negative. 


A. y,~ 12y +0.2y2 7" B. y= 12y; + 0.4y,2 
C. y,— L2ymi 2 L2ym = D. y+ L2ymi 


E. y,- 0.7y,ı — 0.25y,2 + 0.175y-3=0 
(Hint: (x = 0.5)(x + 0.5)(x — 0.7) = x — 0.7x? — 0.25x + 0.175.] 


Part 2: Write each of the above equations using lag operators. Determine the 
characteristic roots of the inverse characteristic equation. 


11. Consider the stochastic difference equation: 


Mee 0.8y-1 +E, — 0.56, 


A. Suppose that the initial conditions are such that yọ = O and €, = €; =0. 
Now suppose that €, = |. Determine the values y, through ys by forward it- 
eration. 


B. Find the homogeneous and particular solutions. 
C. Impose the initial conditions in order to obtain the general solution. 


D. Trace out the time path of an «€, shock on the entire time path of the {y,] se- 
quence. 


12. Use Equation (1.5) to determine the restrictions on Q and B necessary to ensure 


that the {y,} process is stable. 


Appendix | 57 


ENDNOTES 


1. Another possibility is to obtain the forward-looking solution; such solutions are discussed 
in Section 10. 

2. Alternatively, you can substitute (1.26) into (1.17). Note that when e, is a pure random 
disturbance, y, = a, + y,., + €, is called a random walk plus drift model. 

3. Any linear equation in the variables x, through x, is homogeneous if has the form ax, + 
A,X, +e + aax, = 0. To obtain the homogencc:.. portion of (1.10), simply set the inter- 
cept term ag and forcing process x, equal to zero. Hence, the homogeneous equation for 
CLIO is v= yyy + aY pa to + Vag. 

4. If b >a, the demand and supply curves do not intersect in the positive quadrant. The as- 
sumption a > b guarantees that the equilibrium price is positive. 

5. For example, if the forcing process is x, = €, + Biei + Bo€..2 +, the impact multiplier 
can be taken as the partial derivative of y, with respect to x, However, this text follows 
the usual practice of considering multipliers with respect to the {e,} process. 


APPENDIX 1 Imaginary Roots and de Moivre’s Theorem 


: Consider a second-order difference equation Yı = Qyy,_, + Gsy,_2 Such that the dis- 


criminant d is negative (i.e., d = a? + 4a, < 0). From Section 6, we know that the 
full homogeneous solution can be written in the form 


Yr = AO + A20 (Al.1) 


where the two imaginary characteristic roots are 


a, =(a,+iNd/2 and a =(a;~iay2 (AL2) 


The purpose of this appendix is to explain how to rewrite and interpret (A 1.1) in ` | 


terms of standard trigonometric functions. You might first want to refresh your 
memory concerning two useful trig identities. For any two angles 8, and @,, 


sin(8, + 6.) = sin(8,) cos(@,) + cos(8,) sin(®,) 
cos(6, + 85) = cos(8,) cos(@,) — sin(8,) sin(@,) ~ (A13) 


If 6, = @, we can drop subscripts and form 
/ 
sin(28) = 2 sin(®) cos(@) ; ; 
cos(28) = cos(0) cos(8) — sin(6) sin(8) i (A1.4) 


The first task is to demonstrate how to express imaginary numbers in the com- 
plex plane. Consider Figure A1.1 in which the horizontal axis measures real num- 
bers and the vertical axis imaginary numbers. The complex number a + bi can be 
represented by the point a units. from the origin along the horizontal axis and b 


EaR: 


58 Difference Equations 


Imaginary 


0 a * Real 


Figure Al.1 A graphical represcntation of complex numbers. 


e pnp 


units from the origin along the vertical axis. It is convenient to represent the dis- 
tance from the origin by the length of the vector denoted by r. Consider angle 9 in 


triangle Qab and note that cos(@) = a/r and sin(8) = b/r. Hence, the lengths a and ie 


can be measured by 
a = r cos(8) and b=rsin(8) 


In terms of (A1.2), we can define a = a,/2 and b= van. Thus, the characteristic: 
roots @, and œ, can be written as 


a, =a + bi = r[cos(9) + i sin(8)] 
œ =a — bi = r[cos(8) — i sin(8)] (A1.5) 


i i i in wi ression 
The next step is to consider the expressions a and a. Begin with the exp 
a? and recall that i? =—1; 


a? = {r[cos(®) + i sin(8)}} { rfcos(8) + i sin(8))} 
= r[cos(®) cos(9) — sin(9) sin(8) + 27 sin(6) cos(8)] 


From (A1.4), 
a? = P[cos(28) + i sin(20)} 
If we continue in this fashion, it is straighiforward to demonstrate that 


aj = /[cos(r8) +i sin(r®)] and  &4= r[cos(r8)—ésin(*®)] (AL) 


Appendix 2 59 


Since y is a real number and a, and @, are complex, it follows that A, and A, 


must be complex. Although A, and A, are arbitrary complex numbers, they must 
have the form 


A,.= B,[cos(B,) + i sin(B,)] and Az = B,[cos(B,) — i sin(B)} (ALT) 


where B, and B, are arbitrary real numbers measured in radians. 
In order to calculate A,(a!), use (A1.6) and (A1.7) to form 


- A,O = B,[cos(B,) + i sin(B)}r'feosO) +i sin(18)] 


= B, r'[cos{B,) cos(18) — sin(B,) sin(6) + i cos(t®) sin(B,) + i sin(s®) cos(B)] 
Using (A1.3) and (A1.4), we obtain | 
Aai = Bir'[cos(19 + B,) + i sin +8) (AL.8): 
You should use the same technique to convince yourself that =o ts 
A202= Byr'[cos(t@ + B,) — i sin(r8 + By} ; | (A1.9) 
Since the homogeneous solution yt is the sum of (A1.8) and (A1.9), ` 


yf = Byr'[cos(10 + B2) + i sin(10 + B,)] + Byr'[cos(10 + B2) — i sin( + B,)] 
= 2B,r' cos(t® + B3) (A110). 


Since B, is arbitrary, the homogeneous solution can be written in terms of the ar- 
bitrary constants B, and B, 


y} = Bar! cos(té + B) (A1.11) 


Now imagine a circle with a tadius of unity superimposed on Figure Al.1. The 
stability condition is for the distance 7 = 0b to be less than unity. Hence, in the liter- 


ature it is said that the stability condition is for the characteristic roots to lie within 
this unit circle. 


APPENDIX 2 Characteristic Roots in Higher-Order 
Equations 


A . . . . . . 
The characteristic equation to an nth-order difference equation is 
a" ~ aa! ~a,a"? ... -a =0 C (AID 


n 


As stated in Section 6, the n values of œ that solve this characteristic equation 
are called the characteristic roots. Denote the n solutions. by @;, Qz, =, Q„ Given 


ara 


3 


ay 


60 Difference Equations 


the results in Section 4, the linear combination A, + A04 + = + A„@, is also a so- 
lution to (A 1.12). 
A priori, the characteristic roots can take on any values. There is no restriction 


that they be real versus complex nor any restriction concerning their sign or magni- 


tude. Consider the following possibilities: 


1. All the & are real and distinct. There are several important subcases. First sup- 
pose that each value of a, is less than unity in absolute value. In this case, the 
homogeneous solution (A1.12) converges since the limit of each a; equals zero 
as tf approaches infinity. For a negative value of œ, the expression œ; is positive 
for even values of t and negative for odd values of t. Thus, if any of the a, are 
negative (but less than I in absolute value), the solution will tend to exhibit 
some oscillation. If any of the a, are greater than unity in absolute value, the so- 
lution will diverge. 


2. All the œ are real but m S n of the roots are repeated. Let the solution be 
such that @, = Q) = + =, Call the single distinct value of this root «* and let 
the other n-m roots be denoted by a,,,, through a,. In the case of a second-or- 
der equation with a repeated root, you saw that one solution was A,q@’ and the 
other was Ato’, With m repeated roots, it is easily verified that ta, Pa", =, 
r™-'a" are also solutions to the homogeneous equation. With m repeated roots, 
the linear combination of all these solutions is 


Ap + Agta + Asa te FAO H Ane Oey toe tA, (A113) 


3. Some of the roots are complex. Complex roots (which necessarily come in 
conjugate pairs) have the form a, + i0, where a, and 6 are real numbers and i is 
defined to be V-I. For any such pair, a solution to the homogeneous equation is 
A,(a, + i0) + A (a, — 18)’, where A, and A, are arbitrary constants. Transform- 
ing to polar coordinates, we can write the associated two solutions in the form 
Bir’ cos(8r + B,) with arbitrary constants B, and B,. Here stability hinges on the 
magnitude of r’; if |r < 1, the system converges, However, even if there is 
convergence, convergence is not direct since the sine and cosine functions im- 
part oscillatory behavior to the time path of y, For example, if there are three 
roots, two of which are complex, the homogeneous solution has the form 


Br’ cos(Or + By) + A305 


Stability of Higher-Order Systems: In practice, it is difficult to find the actual 
values of the characteristic roots. Unless the characteristic equation is easily fac- 
tored, it is necessary to use numerical methods to obtain the characteristic roots. 
However, for most purposes, it is sufficient to know the qualitative properties of the 
solution; usually, it is sufficient to know whether all the roots lie within the unit cir- 
cle. The Schur theorem gives the necessary and sufficient conditions for stability. 
Given the characteristic equation of (A1.12), the theorem states that if all the n de- 


AY 


Appendix 2 61 


terminants below are positive, the real parts of all characteristic roots are less than | 
in absolute value: 


_| L ~a 
“|-a l 
l 0 QO E s E S 
-l QO =a, -a,_; -a, | 0 QO o 
-=a l 0O -a =a, ~a l 0 0 ~a 
A = 1 n = l n 
et aa, 0 1 -a,(" as -~n 0 0 | =a, ~a 
-an aan 0 l ~â, ~a, 0 0 l ~a, 
7an p la- ~an 0 0 l 
l 0 0 0 | =a, ~an) -a, 
~a,; | 0 0 0 a, ay 
~a} -a, | 0 0 0 -a, =â; 
ie A= Ane ~Anug Map . l 0 0 0 Lo. Ey 
>on | ~an 0 0 3 fy 0 io -@) -@y . wm, 
=a, -a,n 0 BME 0 0 l : E 
~a, -d; ~a . 0 0 0 : . E -q 
“4, -Q, ~a, . . aq 0 0 ; Hoog l 


To understand the way each determinant is formed, note that each can be parti- 
tioned into four subareas. Each subarea of A, is a triangular i x i matrix. The north- 
west subarea has the value 1 on the diagonal and all zeros above the diagonal. The 
subscript increases by | as we move down any column beginning from the diago- 
nal. The southeast subarea is the transpose of the northwest subarea. Notice that the 
northeast subarea has a,, on the diagonal and all zeros below the diagonal. The sub- 
script decreases by ! as we move up any column beginning from the diagonal. The 
southwest subarea is the transpose of the northeast subarea. As defined above, the 
value of aç is unity. 

Special Cases: As stated above, the Schur theorem gives the necessary and suffi- 
cient conditions for all roots to lie in the unit circle. Rather than calculate all these 
determinants, it is often possible to use the simple rules discussed in Section 6. 
Those of you familiar with matrix algebra may wish to consult Samuelson (1941) 
for format proofs of these conditions. 


Chapter 2 


STATIONARY TIME-SERIES MODELS 


The theory of linear difference equations can be extended to allow the forcing 
process {x,} to be stochastic. This class of linear stochastic difference equations un- 
derlies much of the theory of time-series econometrics. Especially important is the 
Box—Jenkins (1976) methodology for estimating time-series models of the form: 


Yr Ag t+ Ay yt + ayy i-p +e, + Brey ces eh Rieu 


Such models are called autoregressive integrated moving average (ARIMA) 
time-series models. The aims of this chapter are to: 


1. Present the theory of stochastic linear difference equations and consider the 
time-series properties of stationary ARIMA models; a stationary ARIMA model 
is called an ARMA model. It is shown that the stability conditions of the previ- 
ous chapter are necessary conditions for stationarity. 


2. Develop the tools used in estimating ARMA models. Especially useful are the 
autocorrelation and partial autocorrelation functions. It is shown how the 
Box—Jenkins methodology relies on these tools to estimate an ARMA model 
from sample data. 


' 3. Consider various test statistics to check for model adequacy. Several examples 


of estimated ARMA models are analyzed in detail. It is shown how a properly 
estimated model can be used for forecasting. 


l /. l 
1. STOCHASTIC DIFFERENCE EQUATION MODELS 


In this chapter, we continue to work with discrete, rather than continuous, time- 


“series, models. Recall from the discussion in Chapter 1 har we can evaluate me 
; ftnction y = f(t) at fg and fot h to form 


ae ie 


eee 


ST 


sete 


64 Stationary Time-Series Models 
Ay= F(t +h) f(t) 


As a practical matter, most economic time-series data are collected for discrete 


time periods. Thus, we consider only ‘the equidistant intervals fo, fo + A, to + 2h. 
fo + 3h, . . . and conveniently set h = 1. Be carcful to recognize, however, that a dis- 
crete time series implies 1, but not necessarily vp is discrete. For example, although | 
Scotland’s annual rainfall is a continuous variable, the sequence of such annual 
rainfall totals for years 1 through ¢ is a discrele time series. In many economic ap- 
plications, ¢ refers to “time” so that h represents the change in time. However, t 
need not refer to the type of time interval as measured by a clock or calendar. 
Instead of allowing our measurement units to be minutes, days, quarters, or years. 
we can use 7 to refer to an ordered event number. We could let y, denote the out- 
come of spin ¢ on a roulette wheel; y, can then take on any of the 38 values 00, 0, I, 

., 36. 

A discrete variable y is said to be a random variable (i.e., stochastic) if for any 

real number r, there exists a probability p(y < r) that y takes on a value less than or 
equal to z. This definition is fairly general; in common usage, it is typically implied 
that there is at least one value of ‘for which 0 <pQ =r) < 1. If there is some r for 
which p(y =) = than random. 
It is useful to consider the elements of an observed time series {Yo Yn Ya» +» Ye} 
as being realizations ) of a stochastic . As in Chapter 1, we 
continue to let the notation y, refer to an element of the entire sequence {y,}. In our 
roulette example, y, denotes the outcome of spin 1 on a roulette wheel. If we ob- 
serve spins I through T, we can form the sequence y,, y2,---» Yr OF More com- 
pactly, {y,}. In the same way, the term y, could be used to denote GNP in time pe- 
riod t. Since we cannot forecast GNP perfectly, y, is a random variable. Once we 
learn the value of GNP in period ¢, y, becomes one of the realized values from a sto- 
chastic process. (Of course, measurement error may prevent us from ever knowing 
the “true” value of GNP.) 

For discrete variables, the probability distribution of y, is given by a formula (or 
table) that specifies each possible realized value of y, and the probability associated 


with that realization. If the realizations are linked across time, there exists the joint ` 
probability distribution p(Q, =r). Y2 = Tz - +++ Yr = rr) where r; is the realized value _ 


of y in period i. Having observed the first ¢ realizations, we can form the expected 


value of Yni Yna. --» Conditioned on the observed values of y, through y,.' This ; 


conditional mean, or expected value, of yuy is denoted by E,W: [Yn Yen wee eR) OF | 


Or course, if y, refers to the outcome of spinning a fair roulette wheel, the proba- ` 
bility distribution is easily characterized. In contrast, we may never be able to com- . 


pletely describe the probability distribution for GNP. Nevertheless, the task of 


economic theorists is to develop models that capture the essence of the true data- | 


generating process. Stochastic difference equations are one convenient way of 
modeling dynamic economic process. To take a simple example, suppose that the 


Federal Reserve's money supply target grows 3% each year. Hence, 


block of discrete stochastic time-series models: 


Stochastic Difference Equation Models 65 
m* = 1.03m*, (2.1) 
or given the initial condition mf, the particular solution is 


my = (1.03)'ms 


where m* = the logarithm of the money supply target in year t 
mò = the initial condition for the target miuucy supply in period zero 


lI 


Of course, the actual money supply m, and target need not be equal. Suppose that 
at the end of period z — 1, there exist m,_, outstanding dollars that are carried for- 
ward into period t. Hence, at the beginning of t there are m,_, dollars so that the gap 
between actual and desired money holdings is m* — m,_,. Suppose that the Fed can- 
not perfectly control the money supply but attempts to change the money supply by 


p percent (p < 100%) of any gap between the desired and actual money supply. We 
can model this behavior as 


Am, = p(m* — m,_,) + €, 


or using (2.1), we obtain 


m, = p(1.03)'m$ + (1 — pmi +6, (2,2) 


where «€, = the uncontrollable portion of the money supply 


We assume the mean of e, is zero in all time periods. 


Although the economic theory is overly simple, the model does illustrate the key 
points discussed above. Note the following: 


1. Although the money supply is a continuous variable, (2.2) is a discrete differ- 
ence equation. Since the forcing process {e,} is stochastic, the money supply is 
stochastic; we can cal] (2.2) a linear stochastic difference equation. 


2. If we knew the distribution of {e,}, we could calculate the distribution for each 
; elementin the {m,} sequence. Since (2.2) shows how the realizations of the En} 
sequence are linked across time, we would be able to calculate the various joint 
probabilities. Notice that the distribution of the money supply sequence is com- 


pletely determined by the parameters of the difference equation (2.2) and distri- 
bution of the {€,} sequence. 


3. Having observed the first 1 observations in the {m,} sequence, we can make 
: forecasts of m,,,, m,42,.... For example, if we update (2.2) by one period and 


take the conditional expectation, the forecast of m, is p(1.03)* tne +:(1 = p)m, 
Hence, Eym,,, = p(1.03)*'m& + (1 — p)m,. E = ' 


j 


Before we proceed too far along these lines, let us go back to the basic 


quence {€,} is a white-noise process if each value in the sequence has a mean of 


66 Stationary Time-Series Models 


zero, a constant variance, and is serially uncorrelated. Formally, if the netation E(x) 
denotes the theoretical mean value of x, the sequence {e} is a white-n@ise process 
if for each time period r, 


Ele) = Elen) = = 0 


Ele) = Ele) = = 6° for-war(e,) = var(e,.,) = = 07] 


and for all j 


E(é, €s) = Ele, €-j-s) = O for all s [or COVE, Ers) = COV(E, 5 Ejs) = O] 

In the remainder of this text, {€,} will always refer to a white-noise process and 
o? to the variance of that process. When it is necessary to refer to two or more 
white-noise processes, symbols such as {€,,} and {€,} will be used. Now, use a 
white-noise process to construct the more interesting time Series: 


a= Bie (2.3) 
i=0 


For each period z, x, is constructed by taking the values €, €,.,,..., €q and mul- 
tiplying each by the associated value of B, A sequence formed in this manner is 
called a moving average of order q and denoted by MA(q). To 

suppose you win $1 if a fair coin shows a head and lose 
$1 if it shows r the outcome on toss t by €, (i.e., for toss f, €, is either 
+$1 or -$1). If you wish to keep track of your “hot streaks,” you might want to cal- 
culate your average winnings on the last four tosses. For each coin toss ¢, your aver- 
age winnings on the last four tosses are 1/4e, + 1/4e,_, + 1/4e,_. + 1/4e,_5. In terms 
of (2.3), this sequence is a moving average process such that B; = 0.25 for i < 3 and 
zero otherwise. 


Although the {€,} sequence is a white-noise process, the constructed 
[not be a white-noise process if two or more ae ter om zee 
o illustrate using an MA(1) process, set Bo = 1. ey = 0.5, and all other B; = O. In 
this circumstance, E(x,) = Ele, + 0.5e,_,) = 0 and var(x,) = var(e, + 0. SE, ) = 1.2507. 
You can easily convince yourself that E(x,) = aR ;) and oo = var(x,_,) for all s. 


Hence, the first two conditions for (x) to be a white-noi * amen ame: 
However, E(x,x,_1) = Elle, + 0.Se,_)(e,., + 0.Se,_2)] = Ele, E€; + €,_;) + 0.5€€,_> 


+ 0.25€,_,€,2) = 0.50”. Given there exists a nonzero value of s such that E(x,x,_,) # 
O, the {x,} sequence is not a white-noise process. 

Exercise | at the end of this chapter asks you to find the mean, variance, and co- 
variance of your “hot streaks” in coin tossing. For practice, you should complete 
that exercise before continuing. 


ARMA Models 67 
2. ARMA MODELS 


It is possible to combine a moving average process with a linear difference equa- 


tion to obtain an autoregressive moving average model. Consider the pth-order dif- 
ference equation: 


Vy = Ay + 2 Midi TX, (2.4) 


Now let {x,} be the MA(q) process given by (2.3) so that we can write 


p q aor 
Yı F Ay + > A + Y Bini pe he, ooon 5) 
iz} i=0 aed a 


We follow the convention of normalizing units so that By is always equal to 
unity. If the characteristic roots of (2.5) are all in the unit circle, {y,} is called an 
autoregressive moving average (ARMA) model for y,.. The autoregressive part of 
the model is the “difference equation” given by the homogeneous portion of (2.4) 
and the moving average part is the {x,} sequence. If the homogeneous part of the 


ni 


-difference equation contains p lags and the model for x, q lags, the model is called 
‚an ARMA(p, q) model. If q = 0, the process is called a pure autoregressive process 
denoted by AR(p), and if p = 0, the process is a pure moving average process de- . 
; noted by MA(q). In an ARMA model, it is perfectly permissible to allow p and/or q 
‘to be infinite. In this chapter, we consider only models in which all the characteris- 
_tic roots of (2.4) are within the unit circle. However, if one or more characteristic 


roots is greater than or equal to unity, the {y,} sequence is said to be an integrated 


` process and (2. 5) is called an autoregressive integrated moving average (ARIMA) 
` model. 


Treating (2.5) as a difference equation suggests that we can “solve” for y, in 
terms of the {€,} sequence. The solution of an ARMA(p, q) model expressing y, in 
terms of the {e,} sequence is the moving average representation of y, The proce- 


„dure is no different from that discussed in Chapter 1. For the AR(1) model Y= + 
< GY; + €, the moving average representation was shown to be 


Yy, = A/a.) + Sale, 


i=0 
For the general ARMA(p, q) model, rewrite (2.5) using lag operators so that 


-Yat y, Sap y Bes 
_ i=0 


\ 


= 4 


68 Stationary Time-Sertes Models 


so that the particular solution for y, is 


q eae 

F a+ > Bien sabes ch eee 

or ae co i 
1- Pal 


i=] 


Fortunately, it will not be necessary for us to expand (2.6) to obtain the specific 
coefficient for each element in {€,}. The important point to recognize is that the ex- 
pansion will yield an MA(ce) process. The issue is whether such an expansion is 
convergent so that the stochastic difference equation given by (2.6) is stable. As 
you will see in the next section, the stability condition is that the characteristic roots 
of the polynomial (1 — Za,L') must lie outside of the unit circle. It is also shown that 
if y, is a linear stochastic difference equation, the stability condition is a necessary 
condition for the time series {y,} to be stationary. 


3. STATIONARITY 


Suppose that the quality control division of a manufacturing firm samples four ma- 
chines each hour. Every hour, quality control finds the mean of the machines’ out- 
put levels. The plot of each machine’s hourly output is shown in Figure 2.1. If yp 
represents machine y; s output at hour ¢, the means (jy,) are readily calculated as 


4 
y, = Y ytd 
i=] 


For hours 5, 10, and 15, these mean values are 5.57, 5.59, and 5.73, respectively. 

The sample variance for each hour can similarly be constructed. Unfortunately, 
applied econometricians do not usually have the luxury of being able to obtain an 
ensemble (i.e., multiple time-series data of the same process over the same time pe- 
riod). Typically, we observe only one set of realizations for any particular series. 
Fortunately, if {y,} is a stationary series, the mean, variance, and autocorrelations 
can usually be well approximated by sufficiently long time averages based on the 
single set of realizations. Suppose you observed only the output of machine 1 for 
20 periods. If you knew that the output was stationary, you could approximate the 
mean level of output by ; 


20 
5 = $20 


ral 


In using this approximation, you would be assuming that the mean was the same 
for each period. In this example, the means of the four series are 5.45, 5.66, 5.45, 


Stationarity 69 


Figure 2.1 Hourly output of four machines. 
6.5 
I | 


1 H t i I 


5.5 


4.5 Eak despari pi Lajedo ii J 
1 2 3 4 5 6 7 8 9 


10 11 12 13 14 18 16 17 18 19 20 


and 5.71. Formally, a stochastic process having a finite mean and variance is co- 
variance stationary if for all t and ts, ' 


EG) =Ey,)=u (27) 

EL, = W°] = Els- u)? = 2 [var(y,) = var(y,_,) = 02] (2.8) 

ELO, = Ws = WI = Ely = We WI =7, . ee 
[cov(y,, Y) = coVv(y,_j, Yij-)) (2.9) 


where 4, © and all y, are constants 


In (2.9), allowing s = 0 means that Yo is equivalent to the variance of y,. Simply 
put, a time series is covariance Stationary if its mean and all autocovariances are 
unaffected by a change of time origin. In the literature, a covariance Stationary 
process is also referred to as a weakly stationary, second-order stationary, or wide- 
Sense stationary process. A Strongly stationary process need not have a finite mean 


- and/or variance (i.c., u and/or Yo need not be finite); this terminology implies that 


weak stationarity can be a more Stringent condition than Strong stationarity. The 
text considers only covariance Stationary series so that there is no ambiguity in us- 
ing the terms stationary and covariance stationary interchangeably. One further 
word about terminology. In multivariate models, the term autocovariance is re- 
served for the covariance between y, and its own lags. Cross-covariance refers to 
the covariance between one series and another. In univariate time-series models 
there is no ambiguity and the terms autocovariance and covariance are used inter 
changeably. 


: For a covariance Stationary series, we can define the autocorrelation between y 
and y,_, as 


Ps = ¥./Yo 


CATAL LAA PR IS AOE aA ARNE s ah iy Aa Sent CAR nto oe noe aatia eee etnies tee area na 


70 Stationary Time-Series Models 


here Yq and y, are defined by (2.9). 
7 Say a Yo are time-independent, the autocorrelation coefficients p, are also 


time-independent. Although the autocorrelation between y, and y, can as ae 
the autocorrelation between y, and y,_2, the autocorrelation oe y, and y, M 
be identical to that between y,_, and y,s-ı: Obviously, Po = I. 


Stationarity Restrictions for an AR(1) Process 


For expositional convenience, first consider the necessary and sufficient oe 
for an AR(1) process to be stationary. Let | | 


¥,= ao + ayy +E, 


where €, = white noise 


Suppose that the process started in period zero, so that yo is a Sea ey 
condition. In Section 3 of the last chapter, it was shown that the solution to 
ec uation is (also see Question 2 at the end of this chapter) 


t~1 t-i f : 
hay i 10 
y, =a Ya, +a, yo + DME (2.10) 
i=0 i=0 


Taking the expected value of (2.10), we obtain 
A S r 
Ey, =a 9,41 + ip... sot Ie elk GAD 
i=0 


Updating by s periods yields 


oe 


mee 


i 1+5 : 2.12) 
EY 45 = 80 Jai ta Yo. ( 


Tat 


I’ we compare (2.11) and (2.12), it is clear that both means are PEA 
Since Ey, is not equal to Ey,,,, the sequence cannot be sa eae ci = ae 
large, we can consider the limiting value of yin (2.10). If a a i o a 
(a) Yo converges to zero as t becomes infinitely large an ‘ He k i 
ra Y + (a)? + +] converges to ay (1 — a,). Thus, as t => œ and i i 


W 2.13 
limy, = agila) + Ý aien (2.13) 
i=0 


Now take expectations of (2.13) so that for sufficiently large values of 1, ae 
tof =a). Thus, the mean value of y, is finite and time-independent so that Ey, = 
t badk is Ei r 


Stationarity 71 


Ey, = u for all 1. Turning to the limiting value of the variance, we find 


EQ,- 2)? = El(e, + a,€, + (a,)"e,5 + )"] 
=O°[1 +(a,)? + (ai) +] = 07/[1 - (a,)}?] 


which is also finite and time-independent. Finally, it is easily demonstrated that the 
limiting values of all autocovariances are finite and time-independent: 


Ely, a WO, aa u) = E{ [e, + aE + (ae + ~V[e,, + QE sy + (a,)"€,4-2 + Jd} 
o= 0°(a, [1 + (a)? + (a) +] 
-= 0°(a,)/[1 - (a,)"] (2.14) 
In summary, if we can use the limiting value of (2.10), the {y,} sequence will be 
Stationary. For any given yy and la, |< L, it follows that £ must be sufficiently 
large. Thus, if a sample is generated by a process that has recently begun, the real- 
izations may not be stationary. It is for this very reason that many econometricians 
assume that the data-generating process has been occurring for an infinitely long 
time. In practice, the researcher must be wary of any data generated from a “new” 
process. For example, {y,} could represent the daily change in the dollar/mark ex- 
change rate beginning immediately after the demise of the Bretton Woods fixed ex- 
change rate system. Such a series may not be stationary due to the fact there were 
deterministic initial conditions (exchange rate changes were essentially zero in the 
Bretton Woods era). The careful researcher wishing to use stationary series might 
consider excluding some of these earlier observations from the period of analysis. 
Little would change had we not been given the initial condition. Without the jni- 


` tial value Yo, the sum of the homogeneous and particular solutions for y, is 


Y =al -a )+ Y aiet Alay ESS Gas) 
f i=0 acd 


= where A = an arbitrary constant 


If we take the expectation of (2.15), it is clear that the {y,} sequence cannot be sta- 
tionary unless the expression A(a,)' is equal to zero. Either the sequence must have 
Started infinitely long ago (so that af = 0) or the arbitrary constant A must be zero. 
Recall that the arbitrary constant has the interpretation of a deviation from long-run 


equilibrium. A succinct way to state the stability conditions is the following: 


I. The homogeneous solution must be zero. Either the sequence must have started 


infinitely far in the past or the process must always be in equilibrium (so that the 
arbitrary constant is zero). ` _ i 


2. The chafacteristic root a, must be less than unity in absolute value. 


These two conditions readily generalize to all ARMA(p, q) processes. We know 
that the homogeneous solution to (2.5) has the form 


waaa nnt 


asea san 


wunnoa 


Mrana 


AAAA ete 
m 


ka mikese 


y 


72 Stationary Time-Series Models 


or if the roots are repeated, 


l aS Ati Sai S 
i=] 


i=m+} r 


where the A, represent p arbitrary values, & are the repeated roots, and the a, are the 
(p — m) distinct roots. 

If any portion of the homogeneous equation is present, the mean, variance, and 
all covariances will be time-dependent. Hence, for any ARMA (p, q) model, station- 
arity necessitates that the homogeneous solution be zero. The next section ad- 
dresses the stationarity restrictions for the particular solution. 


4. STATIONARITY RESTRICTIONS FOR AN 
ARMA\Ip, q) MODEL 


As a prelude to the stationarity conditions for the general ARMA(p, q) model, first 
consider the restrictions necessary to ensure that an ARMA(2, 1) model is station- 
ary. Since the magnitude of the intercept term does not affect the stability (or sta- 
tionarity) conditions, set a) = 0 and write 


Y= AV + aY,-2 +E, + Bien (2.16) 


From the previous section, we know that the homogeneous solution must be 
zero. As such, it is only necessary to find the particular solution. Using the method 
of undetermined coefficients, we can write the challenge solution as 


Yı DA : (2.17) 


For (2.17) to be a solution of (2.16), the various @; must satisfy 


OE, + AE, + OE2 + OE3 He = Gy COE + OAE + OLQE,_5 + OAE +) 
+ AA MpE,-2 + OE3 + Eg + AEs +) HE, + Bie) 


To match coefficients on the terms containing €, €p, €2,..., It is necessary to set 
l. Q=l 
2. 04 =a,%+B, => a,=a,+8, 


ny 


3. Oj; = 4, A), + 220.2 forall? 22 


Stationarity Restrictions for an ARMAtp, q) Model 73 

The key point is that for i 2 2, the coefficients satisfy the difference equation q; = 
aQ; + a202. If the characteristic roots of (2.16) are within the unit circle, the 
{a,} must constitute a convergent sequence. For example, reconsider ‘the case in 
which a, = 1.6, a, = -0.9, and let B, = 0.5. Worksheet 2.1 shows that the coeffi- 


cients satisfying (2.17) are 1, 2.1, 2.46, 2.046, 1.06, ~0.146,.... (also see 
Worksheet 1.2 of the previous chapter). 


WORKSHEET 2.1 Coefficients of the ARMA(2,1) Process: 
Y= 1.6Y 1 — 0.9Y o + €, + 0.5e,4. 


If we use the method of undetermined coefficients, the œ; must satisfy 
Ap = 1 
a, =1.6+0.5 hence, a, = 2.1 


a, = 1.6- a, -0.9-a,_, forall i=2,3,4... 


Notice that the coefficients follow a second-order difference equation with imagi- 
nary roots. With de Moivre’s theorem, the coefficients will satisfy 


a; = 0.949'B, cos(0.567i +B) 
Imposing the initial conditions for ot and a, yields 
iS B, cos(B2) and 2.1 = 0.9498, cos(0.567 +B.) 
Since B, = 1/cos(B,), we seek the solution to 
cos(,) — (0.949/2.1) - cos(0.567 + B,) = 0 
From a trig table, the solution for B, is —1.197. Hence, the a, satisfy 
— 1/1.197 - 0.949! cos(0.567+i — 1.197) 


Alternatively, we can use the initial values of Op and q, to find the other a, by iter- 
ation. The sequence of the a, is shown in the graph below. 


4 


a 
+ 
=] 


Fiia 


74 Stationary Time-Series Models 


The first 10 values of the sequence are 
i 0 1 2 3 4 5 6 7 8 9 10 
a, 1.00 2.10 2.46 2.046 1.06 -0.146 1.187 -1.786 1.761 —1.226 0.378 


To verify that the {y,} sequence generated by (2.17) is stationary, take the expec- 
tation of (2.17) to form Ey, = Ey,.; = 0 for all rand i. Hence, the mean is finite and 
time-invariant, Since the {e,} sequence is assumed to be a white-noise procu», ihe 
variance of y, is constant and time-independent, that is, 


2 
Var(y,) = EJE, + Eny + O22 + OE3 + YJ 


2 
sa 
i=0 


Hence, var(y,) = var(y,.,) for all t and s. Finally, the covariance between y, and 


Yrs ÍS 


Coviy,y,-1) = Elle, + OE1 + Erna + E1 + AE + OE3 + OE t J] 
= O°(Oy + Oy + 30 + =) 

Cov(yn Y2) = Elle, + OE1 + OE2 + AE- + AE H OE 4 H AEs + a) 
= 0° (0 + 030) + yO +) 


so that 
Covly, Ys) = O(A, + O + Ore +) (2.18) 


Hence, cov(y,, y,,) is constant and independent of t. Instead, if the characteristic 
roots of (2.16) do not lie within the unit circle, the {a} sequence will not be con- 
vergent. As such, the {y,} sequence cannot be convergent. 

It is not too difficult to generalize these results to the entire class of ARMA(y, q) 
models. Begin by considering the conditions ensuring the stationarity of a pure 
MA(es) process. By appropriately restricting the Ba all the finite-order MA(q) 
processes can be obtained as special cases. Consider 


; : : 2 
where {e,} =a white-noise process with vanance O 


We have already determined that {x,} is nota white-noise process; now the issue 
is whether {x,} is covariance stationary? (If you need to refresh your memory con- . 
cerning mathematical expectations, you should consult the appendix to this chapter 


Stationarity Restrictions for an ARMA(p. q) Model 75 


before proceeding.) Considering conditions (2.7), (2.8), and (2.9), we ask the fol- 
lowing: 


1. Is the mean finite and time-independent? Take the expected value of x, and re- 


member that the expectation of a sum is the sum of the individual expectations. 
Hence, - 


l E(x) = Ele, + Bye,_, + Br€,_2 +) l 
= Ee, + Bi Eei + BEE, + =0 


Repeat the procedure wiin x,_,: 
E(x) = E(€Wg + Brest + B2€,_.-2 + e) =0 


Hence, all elements in the {x,} sequence have the same finite mean (p = 0). 
2. Is the variance finite and time-independent? Form var(x,) as 


Var(x,) = El(e, + Byes + B62 + 7] 


Square the term in parentheses and take expectations. Since {€,} is a white- 
noise process, all terms Fe,e,_, = 0 for s # 0. Hence, : 


Var(x,) = Ele)? + (BY E(e1)” + (BV Ele,- + 
=0°[l + (B)? + (By)? +] 
As long as E(B? is finite, it follows that var(x,) is finite. Thus, £(B,)” being fi- 
nite is a necessary condition for {x,} to be stationary. To determine whether 


var(x,) = var(x,_,), form 


Var(x,.) = Ellers + Bren si + Poe t P= OL] + (BY + (B2)? + +] 


Thus, var(x,) = var(x,_,) for all £ and t—s. 


3. Are all autocovariances finite and time-independent? First form E(x,x,_,) as 
Ex Xs) = Elle, + BrE + Bota + (Ens + Bie + Breese + e 
Carrying out the multiplication and noting that E(€,¢,_,) = 0 for s # 0, we get 
EXX) = (B, + BiBi +BB + +) 
Restricting the sum B, + B,B,,, + B2B,.2.+ + to be finite means that E(x.) is 
finite. Given this second restriction, it is clear that the covariance between x, and 


‘Xs depends on only the number of periods separating the variables (i.e., the 
value of s), but not the time subscript t. i 


/ 


toy 


tain Sh 


= Seg. Sa 


amame 


76 Stationary Time-Series Models 


In summary, the necessary and sufficient conditions for any MA. process to be 
stationary are for the sums of (1), E(B)”, and of (2), (B, + B Bai + BaBa + +), to 
be finite. Since (2) must hold for all values of s and By = 1, condition (1) is redun- 
dant. The direct implication is that a finite-order MA process will always be sta- 
tionary. For an infinite-order process, (2) must hold for all s > 0. 


Stationarity Restrictions for the Autoregressive Coefficients 


Now consider the pure autoregressive model: 


p 
Y, =g + Sais te sy 219 
i=l ; 


If the characteristic roots of the homogeneous equation of (2,19) all lie inside the 
unit circle, it is possible to write the particular solution as i A 


J a 
Pe y =— De aÑ 


where the a, = undetermined coefficients 


Although it is possible to find the undetermined coefficients {¢;}, we know that 
(2.20) is a convergent sequence so long as the characteristic roots of (2.19) are in- 
side the unit circle. To sketch the proof, the method of undetermined coefficients 
allows us to write the particular solution in the form of (2.20). We also know. that 
the sequence {a} will eventuzily solve the difference equation: a 


Q; — 4,04.) — 220; - 4,0, =0 © eD 
i 
! 
If the characteristic roots of (2.21) are all inside the unit circle, the {a,} sequence 
will be convergent. Although (2.20) is an infinite-order moving average process, 
the convergence of the MA coefficients implies that Za? is finite. Hence, we can 
aye . TO . | 
use (2.20) to check the three conditions for stationarity. Since 0% = 1, 


l. Ey, = Ey, = ao/(1 — Za) 


You should recall from Chapter | that a necessary condition of all characteristic 
roots to lie inside the unit circle is | — La; # 0. Hence, the mean of the sequence is 
finite and time-invariant: i 


2. Var(y,) = El(e, + QE + OE, + AE + --)°] = o’ Io? 


: Given that {x,} is Stationary, only the roots of the 


Stationarity Restrictions for an ARMA(p, q) Model 77 
and 


Vars) = ERE, + Merc + OE 59 + Ea +P] = Ea? 
= t 
: ee ; : : 
Given that £o? is finite, the variance is finite and time-independent 


È COVO» Yrs) = Ele, + O€, + OE, + AEs + O Ersa H OE Dp 
TU (Uy EAO tOO +) Ry 


Thus, the covariance between y, and y,_, 
andt—s. 


Nothing of substance is changed b 
into the general ARMA(p, q) model: 


1S Constant and time-invariant for all ¢ 


y combining the AR(p) and MA(q) models 


` p a E Reo ce eO AS 
Y: =A + ay; +N, . ne 
tml 
x, = D Bie, (2.22) 
Bere os i=0 í ; 


, If the roots of the inverse characteristi 
[i.e., if the roots of the homogeneous fo 
the {x,} sequence is stationary, the {y,} 


c equation lie outside of the unit circle 
rm of (2.22) lie inside the.unit circle] and 
sequence will be stationary. Consider . 


a, € 
y, = 0 + r + Biri pP — 
p 


p 
1-Sa, -$a I-X al 1- Sa 


i=} t=] 


og (2.23) 
i=} i=} 
| With'very little effort, you can convince yourself that the 


the three conditions for stationarit 
of (2.23) is Stationary 


{y,} sequence satisfies 
y. Each of the expressions on the right-hand side 
as long as.the roots of | = Ea,L' are outside the unit circle. 


t autoregressive portion of (2.22 
determine whether the {y,} sequence is stationary. i i 


: hs abour the possibility of using the forward-looking solution? For example 
in Cagan s monetary model you saw that the forward-looking solution yields a con- 


: . ent. After all, if you h 
foresight, econometric forecasting would be unnecessary. y ad perfect 


beens cet 


Bresal 


manaia asmia 


Hee santd 


78 Stationary Time-Series Models 


5. THE AUTOCORRELATION FUNCTION 


The autocovariances and autcorrelations of the type found in (2.18) serve as useful 
tools in the Box-Jenkins (1976) approach to identifying and estimating time-series 
models. We illustrate by considering four important examples: the AR(1), AR(2), 
MA(1) and ARMA(1, 1) models. For the AR(1) model, y, = do + @¥,-1 + €n (2.14) 
shows 


Yo = 1l- (a,)"] 
y= O(a, = (a4) 


Forming the autocorrelations by dividing each y, by Yo we find that pọ = 1, p; = 
ai; p2 = (a)... p, = (a y. For an AR(1) process, a necessary condition for sta- 
tionarity is for la, z 1. Thus, the plot of p, against s—called the autocorrelation 
function (ACF) or correlogram—should converge to zero geometrically if the se- 
ries is stationary. If a, is positive, convergence will be direct, and if a, is negative, 
the autocorrelations will follow a dampened oscillatory path around zero. The first 
two graphs on the left-hand side of Figure 2.2 show the theoretical autocorrelation 
functions for a, = 0.7 and a, = —0.7, respectively. Here, Po is not shown since its 
value is necessarily unity. 


The Autocorrelation Function of an AR(2) Process r 
Now consider the more complicated AR(2) process y, = aymi + 42¥-2 + & We 
omit an intercept term (a,) since it has no effect on the ACF. For the second-order 
process to be stationary, we know that it is necessary to restrict the roots of (1'- 
a,L = aL’) to be outside the unit circle. In Section 4, we derived the autocovari- 
ances of an ARMA(2, 1) process by use of the method of undetermined coeffi- 
cients. Now we want to illustrate an alternative technique using the Yule-Walker 
equations. Multiply the second-order difference equation by y,-, fors=0,5=1, 
s=2,...and take expectations to form 


Eyy, = a Ey, iY, + al V-2¥; Bs Eey, 
EY Yri = ay EY Yen + a Eyy + EE Ymi 
Ey, X,- = Q EY,1Yr2 + GEV 2-2 + E€Y,-2 


EY Sys = QEY Yes + arEY,2¥,-5 $ EEY (2.24) 
By definition, the autocovariances of a stationary series are such that Fy,y,_, = 
Ey, y, = EY,- = Ye We also know that the coefficient on €, is unity so that 


Eey, = 0°. Since Ee,v,_, = 0, we can use the equations in (2.24) to form 


Pahta hO? a BAAS m (2.25) 


Figure 2.2 Theoretical ACF and PACF p: 


ACF 
0.7y(1—1) +elr) 


attems. 


—0.7y(t-1) +e (1) 


The Autocorrelation Function 


PACF 
0.7y(t-1) + elt) 


79 


elr) - 0.7e(t1) 


de i 


+ 
Ë 
i 


E EEE TR E 
E ret i 
EEA E a E 


Sobers ie 


Med enenene 


oo 


80 Stationary Time-Series Models 


Yr = 41% + ai Se ET CANS (2.26) 
Ys = 2y¥s-1 + aY- Ve i / i (2.27) 


Dividing (2.26) and (2.27) by Yo yields 


Pi = 4,Po + aP: (2.28) 
Ps =. aps- + A2P 2 (2.29) 


We know that Po = 1, so that from (2.28), p, = a,/(1 ay). Hence, we tan find/all’ 


p, for s 2 2 by solving the difference equation (2.29). For example, for s = 2 and 
s=3, f | 


p2= (a, (1 — ay) +a 
p =a [la 0 — ay)+ a) + aa, — a) 


Although the values of the p, are cumbersome to derive, we can easily character- 


ize their properties. Given the solutions for po and p,, the key point to note is that 
the p, all satisfy the difference equation (2.29). As in the general case of a second- 
order difference equation, the solution may be oscillatory or direct. Note that the 
stationarity condition for y, necessitates that the characteristic roots of (2.29) lie in- 
side of the unit circle. Hence, the {p,} sequence must be convergent. The correlo- 


gram for an AR(2) process must be such that pọ = 1 and p, is determined by (2.28). _ 


These two values can be viewed as “initial values” for the second-order difference 
equation (2.29). 

The fourth graph on the left-hand side of Figure 2.2 shows the ACF for the 
process y, — 0.7y,_, — 0.49y,_5 + €,. The properties of the various p, follow directly 
from the homogeneous equation y, = 0.7y,_, + 0.49y,_. = 0. The roots are obtained 
from the solution to l 


a = {0.7 + [(-0.7)? - 4(0.49)}"7}/2 


Since the discriminant d = (-0.7)? — 4(0.49) is negative, the characteristic roots 
are imaginary so that the solution oscillates. However, since a, = —0.49, the solu- 
tion is convergent and the {y,} sequence is stationary. : 

Finally, we may wish to find the covariances rather than the autocorrelations. 
Since we know all the autocorrelations, if we can find the variance of y, (i.e., Yo), 
we can find all the other y,. To find Yo, use (2.25) and note that P; = Y/Yo, SO 


Varly,)(Po = aP: = 42P2) = o? 


Substitution for Po, Pi, and Pz yields 


om: 
7 Piao) ——— 
Yo = var(y,) =[(1— a2 ¥( vo | 


The Autocorrelation Function 81 


The Autocorrelation Function of an MA(1) Process 


Next consider the MA(1) process y, = e, + Be. Again, obtain the Yule-Wslker 
equations by multiplying y, by each y,_, and take expectations: 


Yo = var(y,) = Ey yı = E[ (€, + Be, )(€, + Be,-1)] = (l Ba B*)o? 
Yı = EY Ymi = Elle, + Be,_1)(e,, + Be,-2)) = Bo? 


and 


Ye = EVs = Ele, + Bese. + Bess} =0 foralls>] 

Hence, by dividing each y, by Yo, it is immediately seen that the ACF is simply 
Po= 1, pi =B + B), and p, =0 for all s > 1. The third graph on the left-hand side 
of Figure 2.2 shows the ACF for the MA(1) process y, = e, — 0.7e,_,. As an exercise, ` 
you should demonstrate that the ACF for the MA(2) process y, = €, + Biei + Ba 
€,.2 has two spikes and then cuts to zero. = 


The Autocorrelation Function of an ARMA(1, 1) Process 


Finally, let y, = ayı + €, + Bié. Using the now familiar procedure, we find the 
Yule—Walker equations: 


Eyy, = Ey, y, + Fey, + BEG y, => Yo= ayy, +07 +B,(a,+B,)o2 2.30) 


Ey Yri = a EYY + Bey, + BEE, = Yı =a, + B10? (2.31) 
EY,Y,-2 = a EY, 1y.-2 + Ee,y,-2 + B,Ee,_1y,-2 = Ya = an (2.32) 
Ey ys = a EY Yrs + EE Ying + BEE, => Ys = aY- (2.33) 


Solving (2.30) and (2.31) simultaneously for Yo and y, yields 


_ +48) (a, +P) 42 
(=a) 


Hence, 
(1+) +2a,8;) E kaa 


and p, = aP; for all s > 2. 


ae ee 


eae tte 


Wane at Wiara 


=j 


aradan 


var wane 


Sey atedce aad 


82 Stationary Time-Series Models 


Thus, the ACF for an ARMA(|, 1) process is such that the magnitude of p, de- 
pends on both a, and B,. Beginning with this value of p,, the ACF of an ARMA(I, 
1) process looks like that of the AR(1) process. If 0 < a, < 1, convergence will be 
direct, and if -1 < a, < 0, the autocorrelations will oscillate. The ACF for the func- 
tion y, = -0.7y,, + €, — 0.7e,_, is shown in the last graph on the left-hand side of 
Figure 2.2. The top portion of Worksheet 2.2 derives these autocorrelations. 

We leave you with the exercise of deriving the correlogram of the ARMA(2, 1) 
process used in Worksheet 2.1. You should be able to recognize the point that the 
correlogram can reveal the pattern of the autoregressive coefficients. For an 
ARMA(p, q) model beginning at lag q, the values of the p; will satisfy 


Pi = 4;Pp-1 + aPi2 + + ApPip 


The first p — 1 values can be treated as initial conditions that satisfy the Yule- 
Walker equations. 


6. THE PARTIAL AUTOCORRELATION FUNCTION 


In an AR(1) process, y, and y,-z are correlated even though y, does not directly ap- 
pear in the model. The correlation between y, and y,_2 (i.€., Pz) is equal to the corre- 
lation between y, and yı (i.e, pı) multiplied by the correlation between y,_, and 
y,9 (.e., pı again) so that p, = pj. It is important to note that all such “indirect” 
correlations are present in the ACF of any autoregressive process. In contrast, the 
partial autocorrelation between y, and y,_, climinates the effects of the intervening 
values y,., through y,_,,,;. AS such, in an AR(J) process, the partial autocorrelation 
between y, and y,_, is equal to zero. The most direct way to find the partial autocor- 
relation function is to first form the series {y*} by subtracting the mean of y (p) 
from each observation: y* = y, ~ p. Next, form the first-order autoregression equa- 
tion: 


ko ka 
Y* = Quy +e, 


where: e, = an error term 


Here, the symbol {e,} is used since this error process may not be white-noise. 

Since there are no intervening values, $,, is both the autocorrelation and partial 
autocorrelation between y, and y,_,. Now form the second-order autoregression 
equation: 


yi = ryt, + Poovets + €, 


Here, $,, is the partial autocorrelation coefficient between y, and y,_2. In other 
words, 5) is the correlation between y, and y, controlling for (i.e., “netting out’) 


The Partial Autocorrelation Function 83 


the effect of y,.,. Repeating this process for all additional lags s yields the partial 
autocorrelation function (PACF). In practice, with sample size T, only 7/4 lags are 
used in obtaining the sample PACF. l 

Since most statistical computer packages perform these transformations, there is 
little need to elaborate on the computational procedure. However, it should be 
pointed out that a simple computional method relying on the so-called Yule- 
Walker equations is available. One can form the partial autocorrelations from the 
autocorrelations as , 


On =P: > os 335) 
$22 = (P-P — pi (2.36) 


and for additional lags, na . . p 


: s-l f 
Ps7 5 5-1, jPs—j 
jal 


by = po $33, 4, 5, ... (2.37) 
OKATI pn 
jel. 


. where o,= Diy Odd = 1,2, 3, see Sm L 


For an AR(p) process, there is no direct correlation between y, and y,_, for s > p. 
Hence, all values of $,, for s > p will be zero and the PACF fora pure AR(p) 


. process should cut to zero for all lags greater than p. This is a useful feature of the 
l PACF that can aid in the identification of an AR( p) model. In contrast, consider the 
` PACF for the MA(1) process y, = €, + Be,.;. As long as B # —1, we can write y,/(1 
: + BL) = e, which we know has the infinite-order autoregressive representation: 


y7 By,- + By,-2 = By- +e SE, 


As such, the PACF will not jump to zero since y, will be correlated with all its 
own lags. Instead, the PACF coefficients exhibit a geometrically decaying pattem. 


: IfB <0, decay is direct, and if B > 0, the PACF coefficients oscillate. 


Worksheet 2.2 illustrates the procedure used in constructing the PACF for the 


; ARMA(1, 1) model shown in the fifth graph on the right-hand side of Figure 2.2: 


y= -O.7y,4 + é, = 0.7€, 


First calculate the autocorrelations. Clearly, pọ = 1; use Equation (2.34) to calcu- 


; late as p, = -0.8445. Thereafter, the ACF coefficients decay at the rate p; = 
= (0.7)p for i 2 2. Using (2.35) and (2.36), we obtain $,, = 0.8445 and z =` 
3 794250. All subsequent 6,, and ,, can be calculated from (2.37) as in Worksheet 


SAT Ci AREAL ES a epic ete oa 


Sedan iyen pte 


eee ote ents ied temanal 


Pas Lael 


Neate ead 


pvenerveny 


peeve”) 


ce 


J 


84 Stationary Time-Series Models 


WORKSHEET 2.2 Calculation of the partial autocorrelations of 


Table 2.1: Properties of the ACF and PACF 
¥,= 0.7 Y1 + €- 0.7, 


Process ACF 


STEP 1: Calculate the autocorrelations. Use (2.34) to calculate pras > 


AR(1):a,>0 Direct exponential decay: p, = aj. 
7 EON AR(1): a, <0 Oscillating decay: p, = af. 
= (+ 0.49X 0.7-0.7) = -0.8445 ae ds ae ts AR(p) Decays toward zero. Coefficients may 
Pi 140.49 + 2(0.49) oscillate. 
MA(1): B>0 Positive spike at lag 1. p, =0 for 
The remaining correlations decay at the rate p; = -0.7p;_;, so that $22. 
j MA(1):B<0 Negative spike at lag 1. p, =0 fors >2. 
p» = 0.591 p, =-0.414 p4 = 0.290 p5=-0.203 . ARMA(I, 1): Exponential decay beginning at lag 1. 
ps = a 0 Sign p, = sign(a, + P). 
= = 0.070 —0.049 a> gn pi gna, 
pe = 0.142 p- =-0.010 Ps=0 Ps ARMA(I, I): Oscillating decay beginning at lag 1. 
; : a,<0 Sign p, = sign(a, + B). 
TEP 2: Calculate the first two partial autocorrelations using (2.35) and (2.36). 
> i Hence ARMA (p,q) Decay (either direct or oscillatory) 


beginning at lag q. 
Dia = Pi = —0.844 


$22 = [0.591 — (-0.8445)"V[1 ~ (-0.8445)"] = -0.425 


STEP 3: Construct all remaining 9,, iteratively using (2.37). To find $5,, note that 
21 = 91) — 20; = —1.204 and form 


-1 
> 2 
$33 = [e ze Yi py | T 2 tan 
j=ì 


j=l 


decay toward zero beginning at lag p. The deca 
of the polynomial (1 + B,L + BL? +- 


hand-side graphs of Figure 2.2 show th 


indicated processes. 
= [0.414 — (-1.204)(0.591) — (-0.425)(-0.8445)V/ 


[1 — (-1.204)(-0.8445) — (-0.425)(0.591)] 

1. The ACF of an ARMA(p, q) process will be 

= -0.262 lag q, the coefficients of the ACF (i.e. 
a , Similarly, to find Qaa, use 

1 


3 E a DI S 
RE E y 5, pa; | IH > d/P; teristic roots. 
j=l 


j=l 


Since $3; = 02; — $33$22.;, it follows that $3, = —1.315 and $3) = mee 
cd . > l 


; from the model y,/(1 + BiLEBL ++ B9). 
Hence, 


: We can illustrate the usefulness of the 

Das = 0.173 ' 
: 
If we continue in this fashion, it is possible to demonstrate that $5, 


the ACF and PACF functions. If the ac 
~0.117, 66 = 0.081, $7 = 0.056, and ogg = -0.039. 


-retical patterns, the researcher might 


More generally, the PACF of a stationary ARMA(p, q) process must u 


2. The PACF of an ARMA(p, q) process will begin to dec 
. lag p, the coefficients of the PACF (i.e.,.the È.) will mimic the ACF coefficients 


The Partial Autocorrelation F: unction 85 


PACF 
White-noise All p,=0. 


All o,, =0. 

11 = Pri >,,=0 for s > 2, 

$1, =P,:9,,=0 fors >2. 

Spikes through lag p. All o,, 
=0 fors>p. 

Oscillating decay: $,, >0. 


Decay: $,, <0. 

Oscillating decay beginning at 
lag 1. $y, =p). 

Exponential decay beginning at 
lag 1. 1 =p, and sign(d,,) 
= sign($,,). 


Decay (either direct or oscil- 
F latory) beginning at lag p. 


ltimately 


y pattern depends on the coefficients 
B L°). Table 2.1 summarizes some of the 
Properties at the ACF and PACF for various ARMA processes. Also, the right- 


€ partial autocorrelation functions of the five 
For stationary processes, the key points to note are the following: 


gin to decay at lag q. Beginning at 
, the p;) will satisfy the difference equation 
(Pi = a, Pi + a, 5 + + 4,);.,). Since the characteristic roots are inside the 
unit circle, the autocorrelations will decay beginning at | 
tem of the autocorrelation coefficients will mimic that s 


ag q. Moreover, the pat- 
uggested by the charac- 


ay at lag p. Beginning at 


ACF and PACF functions using the model 
Y= ao + 0:7y,_, + €, If we compare the top two graphs of Figure 2.2, the ACF 
shows the monotonic decay of the autocorrelations, while the PACF exhibits the 
`- single spike at lag l. Suppose that a researcher collected sample data and plotted 

tual patterns compared favorably to the theo- 
try to estimate data using an AR(1) model. 


Ce ee ETETE ee 


seres iodd Sample Autocorrelations of Stationary Series 87 
86 Stationary Time-Senies biode 

distributed with a mean equal to zero. For the PACF coefficients, under the null hy- 
pothesis of an AR(p) model (i.e, under the null that all 6,,;,4; are zero), the vari- 
ance of the 4,,;,,.; is approximately 77’. 

In practice, we can use these sample values to form the sample autocorrelation 
and partial autocorrelation functions and test for significance using (2.41). For ex- 
ample, if we use a 95% confidence interval (i.e., two standard deviations), and the © : 
calculated value of r, exceeds 2T~"”*, it is possible to reject the null hypothesis that 
‘+2 first-order autocorrelation is r.: “tatistically different from zero. Rejecting this. « 
hypothesis means rejecting an MA(s — 1) = MA(0) process and accepting the alter- 
native q > 0. Next, try s = 2; var(r,) is (1 + 27/7. If «is 0.5 and T 100, the vari- 
ance of r, is 0.015 and the standard deviation about 0.123. Thus, if the calculated 


Correspondingly, if the ACF exhibited a single spike and the PACF monotonic de- £ 
cay (see the third graph of the figure for the model y, = €, — 0.7€,_,), the researcher 


might try an MA(1) model. 


7. SAMPLE AUTOCORRELATIONS OF 
STATIONARY SERIES 


In practice, the theoretical mean, variance, and autocorrelations of a ae are a 
known to the researcher. Given that a series is stationary, we can use the nee g 
mean, variance, and autocorrelations to estimate the parameters of the actual data- 


eR FRIEND Enya enrages m 


generating process. Let there be T observations labeled y, through yy. We can let Y, 
G°, and r, be estimates of 4, o?, and p, respectively, where: 


value of 7? exceeds 2(0.123), it is possible to reject the hypothesis r, = 0. Here, re- 
jecting the null means accepting the alternative that q > 1. Repeating for the various 
values of s is helpful in identifying the order to the process. In practice, the maxi- 
mum number of sample autocorrelations and partial autocorrelations to use is 7/4. 
When looking over a large number of autocorrelations, we will see that some ex- 


a (2.38) ceed two standard deviations as a result of pure chance even though the true values 
y= r f in the data-generating process are zero. The Q-statistic can be used to test whether a 
see group of autocorrelations is significantly different from zero. Box and Pierce 
ETE te) T 5 (1970) used the sample autocorrelations to form the statistic 
Pema So- - 
= (2.39) n 
i O=T) 
k=l 
and for each value of s=1,2,..-, 

7 If the data are generated from a stationary ARMA process, Q is asymptotically 
Eo -5Y -s 7Y) X distributed with s degrees of freedom. The intuition behind the use of the statis- 
„Em Í 5 (2.40) tic is that high sample autocorrelations lead to large values of Q. Certainly, a white- 


5 


Şo -57 
ral 


The sample autocorrelation function [i.e., the ACF derived from (2.40)] = : 
PACF can be compared to various theoretical functions to help identify the a i 
nature of the data-generating process. Box and Jenkins (1976) discuss the distribu- - 


tion of the sample values of r, under the null that y, is stationary with uae dis- 
tributed errors. Allowing var(r,) to denote the sampling variance of r,, they obtain 


Var(r,) =T7! fors=1 
142907 T?  fors>l (2.41) 


if the true value of r, = 0 [Le., if the true data-generating process is an MA(s Re 
process]. Moreover, in large samples (i.e., for large values of T), r, Will be normally 


noise process (in which all autocorrelations should be zero) would have a Q value 
of zero. If the calculated value of Q exceeds the appropriate value in a x? table, we 
can reject the null of no significant autocorrelations. Note that rejecting the null 
means accepting an alternative that at least one autocorrelation is not zero. , 

; A problem with the Box—Pierce Q-statistic is that it works poorly even in moder- 


` ately large samples. Ljung and Box (1978) report superior small sample. perfor- 


mance for the modified Q-statistic calculated as 


Q=TT+DY NTK) eS 4 as) 
k=! ; 


zero at the specified significance level. The Box-Piercė and Ljung-Box Q-statistics 
also serve as a check to see if the residuals from an estimated ARMA(p. q) model 


PEERY 98 78 


If the sample value of Q calculated from (2.42) exceeds the critical value of x? 
with s degrees of freedom, then at least one value of r, is statistically different from — 


88 Stationary Time-Series Models 


behave as a white-noise process. However, when we form the s correlations from 
an estimated ARMA(p, q) model. the degrees of freedom are reduced by the num- 
ber of estimated coefficients. Hence, if using the residuals of an ARMA(p. q) 
model, Q has a x? with s-p-q degrees of freedom (if a constant is included, the de- 
grees of freedom are s-p-q-}). 


Model Selection Criteria 


One natural question to ask of any estimated model is: How well does it fit the 
data? Adding additivnai lags for p and/or q will necessarily reduce the sv 1 of 
squares of the estimated residuals. However. adding such lags entails the Sstimation 
of additional coefficients and an associated loss of degress of freedom. Moreover, 
the inclusion of extraneous coefficients will reduce the forecasting performance of 
the fitted model. There exist various model selection criteria that trade off a reduc- 
tion in the sum of squares of the residuals for a more parsimonious model. The 
two most commonly used model selection criteria are the Akaike information crite- 
rion (AIC) and Schwartz Bayesian criterion (SBC), calculated as 


AIC = T In(residual sum of squares) + 2n 
SBC = T In(residual sum of squares) + n In(T) 


where n= number of parameters estimated (p + q + possible constant term); : 
T= number of usable observations. 


Typically in creating lagged variables, some observations are lost. To adequately 
compare the alternative models, T should be kept fixed. For example, with 100 data 
points, estimate an AR(1) and AR(2) using only the last 98 observations in each es- 
timation. Compare the two models using T= 98." 

Ideally, the ATC and SBC will be as small as possible (note that both can be neg- 
ative). We can use these criteria to aid in selecting the most appropriate model; 
model A is said to fit better than model B if the AIC (or SBC) for A is smailer than 
that for model B. In using the criteria to compare alternative models, we must esti- 
mate over the same sample period so that they will be comparable. For each, in- 
creasing the number of regressors increases n, but should have the effect of reduc- 
ing the residual sum of squares. Thus, if a regressor has no explanatory power, 
adding it to the model will cause both the AIC and SBC to increase. Since !n(7) 
will be greater-than 2, the SBC will always select a more parsimonious mode] than 
the AIC; the marginal cost of adding regressors is greater with the SBC than the 
AIC. 

Of the two criteria, the SBC has superior large sample properties. Let the true or- 
der of the data-generating process be (p*, q*) and suppose that we use the AIC and 
SBC to estimate all ARMA models of order (p. q) where p 2 p* and q 2 q*. Both 
the AIC and SBC will select models of orders greater than or equal to (p*, q*) as 
the sample size approaches infinity. However, the SBC is asymptotically consis- 
tent, whereas the AIC is biased toward selecting an overparameterized model. 


0.8 —_____--__. -- —-______ 


ee l i 
: iih 
i bie Ay 


Sample Autocorrelations of Stationary Series 89 


Estimation of an AR(1} Model 


Let us usea specific example to see how the sample autocorrelation function and 
partial autocorrelation function can be used as an aid in identifying an ARMA 
model. A computer program was used to draw 100 normally distributed random 
numbers with a theoretical variance equal to unity. Call these random variates €, 
where ¢ runs from 1 to 100. Beginning with ¢ = 1, values of y, were generated using 
the formula y, = 0.7¥,_, + €, and initial condition yy = 0. Note that the problem of 
nonstationarity is avoided since the initial condition is consistent with long-run 
equilibrium, The upper-left-hand graph of Figure 2.3 shows the sample correlogram 
and upper-right-hand graph the sample PACF. You should take a minute to com- 
Be the ACF and PACF to those of the theoretical processes illustrated in Figure 

In practice, we never know the true data-generating process. However, suppose 
we were presented with these 100 sample values and asked to uncover the true 
process. The first step might be to compare the sample ACF and PACF to those of 
the various theoretical models. The decaying pattern of the ACF and the single 


Figure 2.3. ACF and PACF for two simulated processes, 
ACF for AR(1) Process 


PACF for AR{1} Process 


1 3 5 7 9 11 13 15 17 19 1 3 5 7 9 11 13.15 17 19 


ACF for ARMA(1, 1) Process PACF for ARMA(1, 1) Process 


0.2 es 
0 — -i 
0.5 H -d rT gma“ wa 
| i -0.2 4 
0 l Bg = -0.4 co 4 
-0.6 L . 
-0.5 H e i iA pa 
-0.8 y a 
aoe eer = 


Mule aD 1 3 5 7 9 1113 «15 17 19 


90 Stationary Time-Series Models 


large spike in the sample PACF suggest an AR(1) model. The first three autocorre- 
lations are r, = 0.74, ra = 0.58, and r; = 0.47, which are somewhat greater than the 
theoretical values of 0.7, 0.49 (0.77 = 0.49), and 0.343. In the PACF, there is a siz- 


` able spike of 0.74 at lag one and all other partial autocorrelations (except for lag 


12) are very small. 

~ Under the null hypothesis of an MA(0) process, the standard deviation of r, is 
T-'? = 0.1. Since the sample value of r, = 0.74 is more than seven standard devia- 
tions from zero, we can reject the null that z, equals zero. sae standard deviation of 
r, is obtained by applying (2.41) to the sampling data, where s = 2: 


Var(r.) = {1 + 2(0.74)7/100 = 0.021 


Since (0.021)? = 0.1449, the sample value of r3 is approximately four standard 
deviations from zero; at conventional significance levels, we can reject the null hy- 
pothesis that r, equals zero. We can similarly test the significance of the other val- 
ues of the autocorrelations. 


As you can see in the second part of the figure, other than 9,,,, all partial auto- 


correlations (except for lag 12) are less than 2T-"2 = 0.2. The decay of the ACF 


and single spike of the PACF give the strong impression of a first-order autoregres- ` 


sive model. If we did not know the true underlying process and happened to be us- 


ing monthly data, we might be concemed with the significant partial autocorrela- 


tion at lag 12. After all, with monthly data we might expect some direct relationship 
between y, and y,_)2. f 


Although we know that the data were actually generated from an AR(1) process, 


it is illuminating to compare the estimates of two different models. Suppose we es- 
timate an AR(1) model and also try to capture the spike at lag 12 with an MA coef- 
ficient. Thus, we can consider the two tentative models: 


Model 1: y,= @y¥i-1 + & 
Model 2: y, =a, y1 + & + By2€-12 


Table 2.2 reports the results of the two estimations? The coefficient of model | 
satisfies the stability condition La, | < } and has a low standard error (the associ- 
ated t-statistic for a null of zero is more than 12). Asa useful diagnostic check, we 
plot the correlogram of the residuals of the fitted model in Figure 2.4. The Q-statis- 
tics for these residuals indicate that each one of the autocortelations is less than two 
standard deviations from zero. The Ljung-Box Q-statistics of these residuals indi- 
cate that as a group, lags | through 8, | through 16, and 1 through 24 are not sig- 
nificantly different from zero. This is strong evidence that the AR(1) model “fits” 


the data well. After all, if residual autocorrelations were significant, the AR(i) 


model would not be utilizing all available information concerning movements in 
the {y,} sequence. For example, suppose we wanted to forecast y,,, conditioned on 
all available information up to and including period t. With model 1, the value of 
Yri ÍS? Year = Gis + Eni Hence, the forecast from model 1 is a,y,. If the residual au- 


Sample Autocorrelations of Stationary Series 91 


Table 2.2: Estimates of an AR(1) Model 


Model I Model 2 
Y= amt & V= aY tE + Brea 

Degrees of freedom . 99 98 
Sum of squared residuals . 85.21 85.17 
Estimated a, (standard 0.7910 (0.0622) 0.7953 (0.0683) 

error) 3 
Estimated B (standard error) 0.033 (0.4134) 
AIC/SBC AIC = 442.07/SBC = 444.67 AIC = 444,01/SBC = 

449.21 


Ljung-Box Q-statistics for 
the residuals (significance Q(16) = 15.86 (0.391) Q(16) = 15.75 (0.400) 
level in parentheses) Q(24) = 21.74 (0.536) Q(24) = 21.56 (0.547) 


[ECR SSS PSS PE PS SGP TIES 


Q(8) = 6.43(0.490) O(8) = 6.48 (0.485) 


tocorrelations had been significant, this forecast would not be capturing all the 
available information set. 

Examining the results for model 2, note that both models yield similar estimates 
for the first-order autoregressive. coefficient and associated standard error. 
However, the estimate for B,2 is of poor quality; the insignificant ¢ value suggests 
that it should be dropped from the model. Moreover, comparing the AIC and SBC 
values of the two models suggests that any benefits of a reduced residual sum of 
squares aré overwhelmed by the detrimental effects of estimating an additional pa- 


l rameter. All these indicators point to the choice of model 1. 


- Figure 2.4 ACF of residuals from model 1. 


0.2 


0.1 


123456789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 


92 Stationary Time-Series Models 


Exercise 7 at the end of this chapter entails various estimations using this data 
set. In this exercise you are asked to show that the AR(1) model performs eas 
than some alternative specifications. It is important that you complete this exercise. 


Estimation of an ARMA(1, 1) Model 


A second {y,} sequence was constructed to illustrate the estimation of an ARMA- 
ad, 1) Given 100 normally distributed values of the {e,}, 100 values of {y,} wer 
generated using 


y, =—0.7y 1 + €, ~ 0.7 E 


where yo and ep were both set equal to zero. 


Both the sample ACF and PACF from the simulated data (see the eins sae 
‘in Fi ivalent to those of the theoretical mode 
hs in Figure 2.3) are roughly equiva 
oun in Figure 2.2. However, if the true data-generating process was Eee 
the researcher might be concerned about certain discrepancies. An AR(2) Ln e 
could yield a sample ACF and PACF similar to those in the figure. Table 2.3 re- 
ports the results of estimating the data using the following three models: 


Model |: y, = a,y,_, + €, 
Model 2: y, = ayy, + € + Bie,-1 
Model 3: y, = ayy,_; + aY + & 


In examining Table 2.3, notice that all the estimated values of a, are highly sig- 
nificant; each of the estimated values is at least eight standard deviations a aM 
It is clear that the AR(1) model is inappropriate. The a ieee nee 

is signifi lation in the residuals. The 

te that there is significant autocorre 
ARMAT 1) model does not suffer from this problem. Moreover, both the AIC and 
SBC select model 2 over model 1. 


Table 2.3: Estimates of an ARMA(I, 1) Model 


Estimates* Q-Statistics” AIC/SBC 
= AIC = 507.3 
: -0.835 (0.053) Q(8) = 26.19 (0.000) 
noes a Q(24) =41.10(0.001) ° . SBC = 509.9 
= AIC = 481.4 
> : -0.679 (0.076) Q18) = 3.86 (0.695) 
ne Bi —0.676 (0.081) Q(24) = 14.23 (0.892) SBC = 486.6 
0. 
Model 3 a: -1.16 (0.093) Qi8) = 11.44 (0.057) AIC = 492.5 


ree = 497.7 
ay: -0.378 (0.092) Q(24) = 22.59 (0.424) SBC 


"Standard errors in parentheses. 7 T ae 
i i . Significance levels in pare 
*Ljung-Box Q-statistics of the residuals from the fitted model. S g 
ses. 


Sample Autocorrelations of Stationary Series 93 


With the same type of reasoning, model 2 is preferred to model 3. Note that for 
each model, the estimated coefficients are highly significant and the point estimates 
imply convergence. Although the Q-statistic at 24 lags indicates that these two 
models do not suffer from correlated residuals, the Q-statistic at 8 lags indicates se- 
rial correlation in the residuals of mode} 3. Thus, the AR(2) model does not capture 


short-term dynamics as well as the ARMA(I, 1) model. Also note that the AIC and 
SBC both select model 2. 


Estimation of an AR(2) Model 


A third data series was simulated as 


y= 0.7y,_, = 0.49y,_5 +E 
The estimated coefficients of the ACF and PACF of the series are 
ACF: 


Lag: 1: 0.4655046 —0.1607289 -0.3216291 —0.1077528 ~0.0518159 -0.1649841 
7: —0.0995764 0.1283475 0.1795718 0.0343415 —0.0869808 -0.1 133948 
13: —0.1639613 -0.057905} 0.1151097 0.2540039 0.0460659 -0.1745434 
19: —0.1503307 0.0100510 0.0318942 ~0.0869327 —0.0456013 0.0516806 


PACF: 


I: 0.4655046 -0.4818344 0.0225089 0.0452089 
7: 0.1011489 0.0367555 ~0.0758751 0.0229422 ~0.0203879 ~0.1391730 
13: -0.1671389 0.2066915 0.0074996 0.0851050 ~0.2156580 0.0131360 
19: ~0.0223151 —0.0324078 0.0148130 ~0.0609358 9.0374894 —0,1842465 


~0.2528370 -0.1206075 


Coefficient Estimate Standard Error — ¢-Statistic Significance 
a, 0.692389807 0.0895 15769 7.73484 0.00000000 
4 > -0.480874620  0.089576524 -5.36831  0.00000055 


AIC =21 9.87333, SBC = 225.04327 


. Overall, the model appears to be adequate. However, the two AR(2) coefficients 
are unable to capture the correlations at very long lags. For example, the partial au- 


prm 


Tq 


T] 


S 


“4 


96 Stationary Time-Series Models 


Jenkins argue that parsimonious models produce better forecasts than overparame- 
terized models. A parsimonious model fits the data well without incorporating any 
needless coefficients. The aim is to approximate the true data-generating process 
but not to pin down the exact process. The goal of parsimony suggested eliminating 
the MA(12) coefficient in the simulated AR(1) model above. 

In selecting an appropriate model, the econometrician needs to be aware that sev- 
eral very different models may have very similar properties. As an extreme exam- 
ple, note that the AR(1) model y, = 0.5y,_, + €, has the equivalent infinite-order 
moving average representation y, = €, + 0.5¢,_, + 0.25€,_, + 0.125€,_, + 0.0625€,_, + 
.... In most samples, approv'm ating this MA(%) process with an MA(2) or MAC) 
model will give a very good fit. However, the AR(1) model is the more parsimo- 
nious model and is preferred. 

Also be aware of the common factor problem. Suppose we wanted to fit the 
ARMA(2, 3) model: 


(L-a,L—aL*)y, = (1+ BL + BL? + B3L*)e, , (2.43) 


Also suppose that (1 - a,L — aL?) and (1 + B,L + BaL? + 85L°) can each be fac- 
tored as (1 + cL)(1 + aL) and (1 + cL)(1 +b;L + bD-L°), respectively. Since (1 + cL) 
is a common factor to each, (2.43) has the equivalent, but more parsimonious, 
form:* 


Q +alyy,=(1 +,L + bL’), - (24) 


In order to ensure that the model is parsimonious, the various a; and f; should all 
have f-statistics of 2.0 or greater (so that each coefficient is significantly different 
from zero at the 5% level). Moreover, the cocfficients should not be strongly corre- 
lated with each other. Highly collinear coetficients are unstable; usually one or 
more can be eliminated from the model without reducing forecast performance. 


Stationarity and Invertibility 


The distribution theory underlying the use of the sample ACF and PACF as approx- 
imations to those of the true data-generaling process assumes that the {y,} sequence 
is stationary. Moreover, f-statistics and Q-statistics also presume that the data are 
stationary. The estimated autoregressive coefficients should be consistent with this 
underlying assumption. Hence, we should be suspicious of an AR(1) model if the 
estimated value of a, is close to unity. For an ARMA(2, q) model, the characteristic 
roots of the estimated polynomial (1 — a,L — aL”) should lie outside of the unit cir- 
cle. ; 

The Box-Jenkins approach also necessitates that the model be invertible. 


Formally, {y,} is invertible if it can be represented by a finite-order or convergent =» = 
autoregressive process. Invertibility is important because the use of the ACF and : 
PACF implicitly assumes that the {y,} sequence can be well approximated by an 


Box-Jenkins Model Selection 97 


autoregressive model. Asa demonstration, consider the simple MA(1) model: 


$ y,=€,— Bye,, (2.45) 
so that if 1B, ] <1, | 
on WAH BD He) o 
or 
Yet Biya + Biya + Biya + = €, AB) 


If 1p, | < 1, (2.46) can be estimated using the Box—Jenkins method. However. 
if IB, 2 1, the {y,} sequence cannot be represented by a finite-order AR process: 
as such, it is not invertible. More generally, for an ARMA model to have a conver- 
gent AR representation, the roots of the polynomial (1 + BL + BL? + 4 BL’) 
must lie outside of the unit circle. Note that there is nothing “improper” about a 
non-invertible model. The {y,} sequence implied by y, = € ~ €,_, iS stationary in 
that it has a constant time-invariant mean (Ey, = EY, = 0, a constant time-invariant 
variance [var(y,) = var(y,_,) = 6°(1 + Bj)], and the autocovariances y, = -B,o? and 


all other y, = 0. The problem is that the’ technique does not allow for the estimation 
of such models. If B, = 1, (2.46) becomes 


Yi 5 Yr + Y2 > Ving E Yra tov 


Clearly, the autocorrelations and partial autocorrelations between y, and y,_, will 
never decay. i 


Goodness of Fit 


A good model will fit the data well. Obviously, R? and the average of the residual 
sum of squares are common “goodness-of-fit” measures in ordinary least squares. 
The problem with these measures is that the “fit” necessanly improves as more pa- 
rameters are included in the model. Parsimony suggests using the AIC and/or SBC 
as more appropriate measures of the overall fit of the model. Also, be cautious of 
estimates that fail to converge rapidly. Most software packages estimate the param- 
eters of an ARMA model using non-linear search procedures. If the search fails to` 
converge rapidly, it is possible that the estimated parameters are unstable’ In such 


circumstances, adding an additional observation or two can greatly alter:the esti- 
mates. aoe 


: The third stage in the Box—Jenkins methodology involves diagnostic checking. l 


The standard practice is to plot the residuals to look for outliers and evidence of pe- 
riods in which the model does not fit the data well. If all plausible ARMA models 


BER pra ae re a a EE 4 


3 
at 
bed 
st 
at 

a 
4 

4 

4 


EE ESA 


98 Stationary Time-Series Models 


show evidence of a poor fit during a reasonably long portion of the sample, it is 
wise to consider using intervention analysis, transfer function analysis, or any other 
of the multivariate estimation methods discussed in later chapters. If the variance of 
the residuals is increasing, a logarithmic transformation may be appropriate. 
Alternatively, you may wish to actually model any tendency of the variance to 
change using the ARCH techniques discussed in Chapter 3. 

It is particularly important that the residuals from an estimated model be serially 
uncorreiated. Any evidence of serial correlation implies a systematic movement in 
the {y,} sequence that is not accounted for by the ARMA coefficients included in 
the model. Hence, any of the tentative models yielding nonrandom residuals should 
be eliminated from consideration. To check for correlation in the residuals, con- 
struct the ACF and PACF of the residuals of the estimated model. You can then 
use (2.41) and (2.42) to determine whether any or all of the residual autocorrela- 
tions or partial autocorrelations are statistically significant.’ Although there is no 
significance level that is deemed “most appropriate,” be wary of any model yield- 
ing (1) several residual correlations that are marginally significant and (2) a Q-sta- 


tistic that is barely significant at the 10% level. In such circumstances, it is usually” 


possible to formulate a better performing model. 

In the previous section, recall that the estimated AR(1) mode! had Box-Ljung 
Q-statistics indicating a possible MA term at lag 12. As a result, we also estimated 
the model y, = 0.7953y,_, + €, — 0.033€,_)2. The procedure of adding another coeffi- 
cient is called overfitting. Overfit a model if the initia! ACF and PACF yield am- 


biguous implications conceming the proper form of the ARMA coefficients. In the ` 


first example, the AR(1) model (i.e., model 1) outperformed the ARMA(I, 1) 
model. Obviously, in other circumstances, the “overfitted” model may outperform 
the first model. As an additional diagnostic check, some researchers will overfit a 
model by including a coefficient at some randomly selected lag. If such overfitting 
greatly affects the model, the estimated model is likely to yield poor forecasts. 

If there are sufficient observations, fitting the same ARMA model to each of two 


subsamples can provide useful information concerning the assumption that the: 


data-generating process is unchanging. In the estimated AR(2) model in the last 
section, the sample was split in half. In general, suppose you estimated an 
ARMA(p, q} model using a sample size of T observations. Denote the sum of the 
squared residuals as SSR. Divide the T observations into two subsamples with ta 
observations in the first and t, = T — t„ observations in the second. Use each sub- 
sample to estimate the two models: 


y= At) + (Ly Foe FAD, + + Bye) te + BADE 

using fy... bn 
= ag(2) + ay(2)y 21 PERR a (2p + & + Bi(2)e,.4 Haet B,(2)€,_4 

USING Engi ofr 


Let the sum of the squared residuals from each model be SSR, and SSR), respec- 


tively. To test the restriction that all coefficients are equal [i.e., ag(1} = ao(2) and 


The Forecast Function 99 


a,(1) = a,(2) and . . . a,(1) = a,(2) and B,(1) = B,(2) and . . . B,(1) = B,(2)], use an 
F-test and form:® 


2 (SSR-SSR,~SSR,)/(n) 
(SSR,+SSR,)/(T-2n) 


(2.47) 


where n = number of parameters estimated (n = p +q + 1 if an intercept is in- 
cluded and p + q otherwise) l 
the number of degrees of freedom are (n. T ~ 21). 


Intuitively, if the restriction that the two sets of coefficients is not binding, the 
total from the two models (i.e., SSR, + SSR) should equal the sum of the squared 
residuals from the entire sample estimation. Hence, F should equal zero. The larger 
the calculated value of F, the more restrictive is the assumption that the two sets of 
coefficients are equal. 

Similarly, the model can be estimated over nearly all the sample period. If we 
use 20 years of quarterly data, for example, the model might be estimated using 
only the first 19 years of data. Then, the model can be used to make forecasts of the 
last year of data. For each period t, the forecast error is the difference between the 
forecast and known value of y, The sum of the squared forecast errors is a useful 
way to compare the adequacy of alternative models. Those models with poor out- 
of-sample forecasts should be elimirated. Some of the details in constructing out- 
of-sample forecasts are discussed in tde next section. 


9. THE FORECAST FUNCTION 


Perhaps the most important use of an ARMA model is to forecast future values of 
the {y,} sequence.’ To simplify the discussion, it is assumed that the actual data- 
generating process and current and past realizations of the {¢,} and {y,} sequences 
are known to the researcher. First, consider the forecasts from the AR(1) model y= 
ao + dyy,., + €. Updating one period, we obtain . i 
Dieks ay + ayy, +e 


t+] 


If you know the coefficients ag and a, you can forecast Yı conditioned on the 
information available at period ras 


EV eat Fay + ayy, (2.48) 


where E,y,,, = a short-hand way to write the conditional expectation of y,,, given 
the information available at t 


Formally, E,y.;= Eny ly, Vly Vets ae Eee Speed: 


RETA 


100 Stationary Time-Series Models 


In the same way, SiNce Ypa = dy + Ayri Y Enr the forecast of y..2 conditioned on 


the information available at period ¢ is 


E Yna = lot EN 


and using (2.48), we obtain 


EY = Ap + aly +a), 
It should nur require too much effort iv convince yourself that 
2 3 
Eys = do + oft, + dq] +aiy, 


and in general, . 


Ey inj = 400 +a, + ajt wt ay!) + aye ie) 


Equation (2.50)—called the forecast function—yields the j-step ahead rake 
for each value y,,;. Since |a; | < 1, (2.50) yields a convergent sequence © oe 
casts. If we take the limit of E,y,,; as j > e We find that E,y,.; > ad( — a,). This 
result is really quite general. For any stationary ARMA model, the ee 
cast of Yı; converges to the unconditional mean as J > °°. Unfortunate y, the fo 
casts from an ARMA model will not be perfectly accurate. Forecasting from time 
period z, we can define the j-step ahead forecast error, f,()—as the difference be- 
tween the realized value of y, and forecasted value: 


FAD Sy 7 E Yj 


Hence, the one-step ahead forecast error is: f) = Yet = Een = €,,, (en the 
“unforecastable” portion of y, given the information available in 1). To fing the 
two-step ahead forecast error, we need to form s2 = Yon — E,y,42- SINCE Yua = G0 F 
GA + A2Y, + Ear + AEn ANG EYn2 = A + A00 + Fr it follows that 


f (2) = En t AEri 


You should take a few moments to demonstrate that for the AR(1) model, the 
j-step ahead forecast error is given by 


} 2 3 ESE E qr! (2.51) 
fD me Ej T Gy Era j-1 + QE rey -2 + QE r4j-3 7 Be M Eisi 


Equation (2.51) shows that the forecasts from (2.50) yield unbiased ee 
each value y,,;. The proof is trivial; since Eki Efnja 7 = E€, = 0, the condi- 
tional expectation of (2.51) is E,f,(j) = 0. Since the expected value of the forecast 
error is zero, the forecasts are unbiased. 


The Forecast Function 101 


Although unbiased, the forecasts from an ARMA model are necessarily inaccu- 
rate. To find the variance of the forecast error, continue to assume that the elements 
of the {e,} sequence are independent with variance o°. Hence, from (2.51) the vari- 
ance of the forecast error is 


Var[f (Pl =C[] +a? + af + a$ + o + ath] (2.52) 


‘Since the one-step forecast error variance is 0’, the two-step ahead forecast error 
variance is o7(1 + ai), etc. The essential point to note is that the variance of the 
forecast error is an increasing function of j. As such, you can nave Move vunniguuce 
in short-term rather than long-term forecasts. In the limit as j — æ, the forecast er- 
ror variance converges to 07/(1 — aĵ); hence, the forecast error variance converges 
to the unconditional variance of the {y,} sequence. 

Moreover, assuming that the {€,} sequence is normally distributed, you can place 
confidence intervals around the forecasts. The one-step ahead forecast of y,,, is 
dy + ayy, and the variance is 0°. As such, the 95% confidence interval for the one- 


step ahead forecast can be constructed as 
ao +a,y, + 1.960 
In the same way, the two-step ahead forecast is a(1 + ay) + afy, and (2.52) indi- 


cates that var[f,(2)] is o7(1 + a?). Thus, the 95% confidence interval for the two- 
step ahead forecast is 


a(l + a,) + ay, £ 1.96001 + a7)'” 


Of course, if there is any uncertainty concerning the parameters, the confidence 
intervals will be wider than those reported here. 


Iterative Forecasts 


` The derivation of (2.50)—the forecast function for an AR(1) model—relied on for- . 


ward iteration. To generalize the discussion, it is possible to use the iterative tech- 
nique to derive the forecast function for any ARMA(p, q) model. To keep the alge- 
bra simple, consider the ARMA(2, 1) model: 


Yı = Ao + Ayy,-) + A32 + €, + BLE, l (2.53) 

Updating one period yields 
Yiri = Aq + AY, + AYr1 F Eras + Bre, 
If we continue to assume that (1) all coefficients are known; (2) all variables sub- 


scripted #, t — 1, t — 2, etc. are known at period r; and (3) E,e,,; = 0 for j > 0, the con- 
ditional expectation of y,,, is 


eee nie 


Wenn eae 


aaiae aa 


yeerear 


Sy 


102 Stationary Time-Series Models 
EY = do + Ay, + AY + Bye, ae (2.54) 


Equation (2.54) is the one-step ahead forecast of y,,,. To find hi two-step ahead 
forecast, update (2.53) by two periods: 


Yer = Uo + AY + Ay, En t Biers 
The conditional expectation of y,,. is 
Eur = ay + MEY + aday, C 


Equation (2.55) expresses the two-step ahead forecast in terms of the one-step 


ahead forecast and current value of y,. Combining (2.54) and (2.55) yields 


E Yn2 = Ag + G(Ay + A,Y, + aapa + Bie) + ery, 
2 
= Ao(1 + ay) + (aj + a3), + A1031 + Bie, 


You should be able to demonstrate that the three-step ahead forecast is 


EY = ao + QE a2 + DEY 41 
= dy + A,{ag(l + a,) + [a] + aly, + aay + a Bie) + 
aao + ,y, + Ymi + Bye) 
=a (l + a, +a? +a) + (a) + 2ayaa)y, + (ajay + a3)y,1 + Bila? + are, — (2.56) 


Finally, all j-step ahead forecasts can be obtained from 
Eig = âo + EY aj + QE apr a ee itiu (2.57) 


Equations (2.56) and (2.57) suggest that the forecasts will satisfy a second-order 
difference equation. As long as the characteristic roots of (2.57) lie inside the unit 
circle, the forecasts will coverge to the unconditional mean a,/(1 — a, — a2). 


An Alternative Derivation of the Forecast Function 


Instead of using the iterative technique, it is often preferable to derive the forecast 
function using the solution methodology discussed in Section 4 of Chapter 1. For 
any ARMA(p, q) model, the solution technique entails (1) finding all homogeneous 
solutions; (2) finding the particular solution; (3) forming the general solution as the 
sum of the homogeneous and particular solutions; and (4) imposing the initial con- 
ditions. This solution methodology will express y, in terms of the p initial condi- 
tions Yo, Yi» -. -> Yp-1 and q initial values €o, €,,... €,.;. The only twist is that the 
forecast function expresses y, in terms Of Yp Yay.) Yopal and €n € 4. +65 
€,-gs1- TO illustrate the appropriate modification of the time subscripts, consider the 
AR(2) model: 


` and the {e,} sequence. Updating by j periods, we find 


The Forecast Function 103 


y,=3+0.9y,_,- 0.2y,_, +6, 


In Section 8 of Chapter 1, it was shown that the solution is 


: 1-2 
¥, = 10 +(0.4)'[5(y9 - 10) -10(y, — 10)] + (0.5)'[10(y, ~ 10) ~ 4( yy — 10)] + SY ae 


i=Q) 


t-i 


where the values of œ satisfy o; = 5(0.5¥ — 4(0.4)', 


The problem is to modify this equation so as to express y,,; in terms of yp Yyy 
Yi- ’ 


Yj = 10+ (0.4) [5(y,_, -10)~10(y, -10)] 
jot 


+(0.5Y[10(y, -10)—4(y,.) =10]+ S aye, 


in() 


Taking the conditional expectation of y,,; yields the forecast function: 
Eig = 10 + (0.4Y[5(6,_, — 10) ~ 100, - 10)] + (0.5Y[10(y, — 10) = 401 > — 10)} 


Obviously, as j increases, the forecast approaches the unconditional mean of 10. 
For practice, try the ARMA(1, 1) model: 


Y= ay t+ ayy) +e, + BE 


where {e,} is a white-noise process, la, |< l, and there is a given initial condition 
for yo. 
You should recognize that the homogeneous equation y, — @,),., = 0 has the solu- 


tion A(a,)‘, where A is an arbitrary constant. Next, use lag operators to obtain the 
particular solution as 


Y= A/V a) +E aL) + Bye fl -—a Ll) (2.58) 
so that the general solution is 
Y= la) Saige +B, > aet Aal (2.59) 
i=0 i=0 . 


Now impose the initial condition for Yo. Since (2. 59) must hold for all periods, 
including period zero, it follows that 


Nor aS Yale +8, Se nit A (2.60) 


i=Q i=0 


104 Stationary Time-Series Models 


Solving (2.60) for A eliminates the arbitrary constant. Combining (2.59) and a E 


(2.60), we get 


i=í) i=0 


yp Saya) Ý alemi + By Meter 


i=0 i=0 
so that 


1 t-i 


Y= %74) Y ae +B, aie, i thy G/U a lay (2.61) 


1=0 t=O 


To this point, (2.61) is simply the general solution to the stochastic difference 
equation represented by an ARMA(1I, 1) process. This solution expresses the cur- 
rent value of y, in terms of the constants ay, ay, and Bi. (e,} sequence, and initial 


value of yo 7 n 
The important point is that (2.61) can be used to forecast y, conditioned on infor- 


mation available at period zero. Given Epe; = 0 for i > 0, it follows that 
Egy, = Goll — a) + Bray Eo + Do = aol- a)Ja; (2.62) 


Equation (2.62) can be viewed as the t-step ahead forecast function given infor- 
mation available in period zero. To form the j-step ahead forecasts conditioned on 
information available at r, first change the time subscript in (2.62) so that the j-step 
ahead forecasts are 


Eq; = ayl- ay) + Biaj! €y + Do — aol - ajar 
= [a(l -ad — a) + Baf €o + yoat othe (2.63) 


Next, update (2.63) by t periods so that 


Ey [ay -ap — ai) + Bayle, + yat (2.64) 


taj 


Equation (2.64) is in the desired form: (2.64) expressed the forecast of y,,; condi- 
tioned on information available at period #. The various j-step ahead forecasts are 


Eyni = ao + Bier + ayy, 5 
Ewa = (aol — a IG = 03) + Bae tM 

2 3 
E Yra = {a /U E ap = a) t Biaje, + ya 


A , n 
+| Yo 7al =a) -X aje -B aE; | 


The Forecast Function 105 = 


Given that la, | < 1, the limiting value of the forecast as j — © is the uncondi- 


. tional mean: lim Eyn; = a(l - a). 


As a check, you can compare (2.64) to (2.50); after all, the AR(I) and ARMA(I, 


. 1) models are equivalent if B, = 0. If B, = 0, (2.64) becomes l 


E Yn; = laJ -a O - a4) + ya’ (2.65) 


Note that (2.65) is identical to (2.50); for |a, | < 1, 


i=0 


jet Beings 
ay Y ai =[ap/(1~a,)}(I-af) 


The example illustrates the basic point that for any ARMA( p, q) model, the fore- E 
cast function for y,,; will have the form 


EY raj = (j) + ay: + (Y-i Hrt OA DY-pet + Ve, Paet YaEr-qr1 (2.66) : > 


where all values of &;( j) and y;(j) are undetermined coefficients. 


The notation o,(j) and y,(j) is designed to stress the point that the coefficients are a 
function of j. Since we are working with stationary and invertible processes, we 


know the nature of the solution is such that as j — œ, XG) > a (1 — Za), œj) 9 a 


0, and that E[y,(;)]? is finite. 

In practice, you will not know the actual order of the ARMA process or coeffi- 
cients of that process. Instead, to create out-of-sample forecasts, it is necessary to 
use the estimated coefficients from what you believe to be the most appropriate 
form of an ARMA model. The rule of thumb is that forecasts from an ARMA 
model should never be trusted if the model is estimated with fewer than 50 observa- 
tions. Suppose you have T observations of the {y,} sequence and choose to fit an 
ARMA(2, 1) model to the data. Let a hat or caret (i.e.: a ^) over a parameter denote 
the estimated value of a parameter and let {é,} denote the residuals of the estimated 
model. Hence, the estimated AR(2, 1) model can be written as 


Yı =G +Qyy,-1 + aV,2 + €, + Biên 


Given that the sample contains T observations; the out-of-sample forecasts are 
easily constructed. For example, you can use (2.54) to forecast the value of Yr, as 


Eyra: = 49 + Gy yp +Â Bey (2.67) 
` Given the estimated values of dg, â}, and â,, (2.67) can easily be constructed us- 


ing the actual values yz, y;_,, and ê (i.e., the last residual of your estimated model). 
Similarly, the forecast of y+, can be constructed as 


sanaa aaa 


ee 


a 


106 Stationary Time-Series Models 
Eq rin = Gy +â iErYj + ayy 


where Eyr, = the forecast from (2.67) 
Given these two forecasts, all subsequent forecasts can be obtained from the dif- 


ference equation: 


Evry = Go + a, E7Yraj + ag Exy74j-2 forj 2 2 


10. A MODEL OF THE WPI 


The ARMA estimations performed in Section 8 were almost too straightforward. In 
practice, we rarely find a data series precisely conforming to a theoretical ACF or 
PACF. This section is intended to illustrate some of the ambiguities frequently en- 
countered in the Box—Jenkins technique. These ambiguities may lead two equally 


skilled econometricians to estimate and forecast a series using very different . 


ARMA processes. Many view the necessity to rely on the researcher’s judgment 
and experience as a serious weakness of a procedure that is designed to be scien- 
tific. 

It is useful to illustrate the Box—Jenkins modeling procedure by estimating a 
quarterly model of the U.S. Wholesale Price Index (WPI). The file labeled 
WPI.WK1 on the data disk contains the data used in this section. Exercise 10 at the 
end of this chapter will help you to reproduce the results reported below. 

The top graph of Figure 2.5 clearly reveals that there is little point in modeling 
the series as being stationary; there is a decidedly positive trend or drift throughout 
the period 1960:1 to 1990:IV. The first difference of the series seems to have a con- 
stant mean, although inspection of the middle graph suggests that the variance is an 
increasing function of time. As shown in the bottom graph of the same figure, the 
first difference of the logarithm (denoted by Al/wpi) is the most likely candidate to 
be covariance stationary. The large volatility of the WPI accompanying the oil 
price shocks in the 1970s should make us somewhat wary of the assumption that 
the process is covariance stationary. At this point, some researchers would make 
additional transformations intended to reduce the volatility exhibited in the 1970s. 
However, it seems reasonable to estimate a model of the {Alwpi,} sequence. As al- 
ways, you should maintain a healthy skepticism of the accuracy of your model. 

Before reading on, you should examine the autocorrelation and partial autocorre- 
lation functions of the {Alwpi,} sequence shown in Figure 2.6. Try to identify the 
tentative models that you would want to estimate. In making your decision, note the 
following: 


1. The ACF and PACF converge to zero reasonably quickly. We do not want to 


overdifference the data and try to model the {A*/wpi,} sequence. 
2. The theoretical ACF of a pure MA(q) process cuts off to zero at lag q and he 


theoretical ACF of an AR(1) model decays geometrically. Examination of the , 


A Model of the WPI 107 


two graphs of Figure 2.6 suggests that neither of these specifications seems ap- 
propriate for the sample data. 


3. The PACF is such that $,, = 0.609 and cuts off to 0.252 abruptly (i.e., 62. = 
0.252). Overall, the PACF suggests that we should consider models such as p = 1 
and p = 2. The ACF is suggestive of an AR(2) process or a process with both au- 
toregressive and moving average components. 


4. Note the jump in ACF at lag 4 and the small spike in the PACF at lag 4 (044 = 
0.198). Since we are using quarterly data, we nugni want to incorporate a sea- 
sonal factor at lag 4. l 


Figure 2.5 U.S. wholesale price index (1985 = 100). 
140 


120 
100 
80 
60 
40 
20 


1960 1966 1972 1978 1984 1990 


First difference of the WPI ` 


l i ji 
1960 1966 1972 1978 1984 1990 


Logarithmic change in the WPI 
I | | 


A ; ; } 
1960 . 1966 1972 1978 1984 1990 


108 


Stationary Time-Series Models 


Figure 2.6 ACF and PACF for the logarithmic change in the WPI. 


Autocorrelations ; 
Pls a 7 


Partial Autocorrelations 
a aaa es rape We a 


Points 1 to 4 suggest an ARMA(I, 1) or AR(2) model. In addition, we might 


want to consider models with a seasonal term at lag 4. Since computing time is in- 
expensive, we can estimate a variety of models and compare their results. Table 2.4 
reports estimates of five tentative models; note the following points: 


1. 


The estimated AR(1) model confirms our analysis in the identification stage. 
Although the estimated value of a, (0.618) is less than unity in absolute value 
and more than eight standard deviations from zero, the AR(1) specification is in- 
adequate. Forming the Ljung-Box Q-stutistic for 12 lags of the residuals yields 
a value of 23.6; we can reject the null that Q = 0 at the 1% significance level. 


Hence, the lagged residuals of this model exhibit substantial serial autocorrela- > 


tion. Then we must eliminate this model from consideration. 


A Model of the WPI 109 


2. The AR(2) model is an improvement over the AR(1) specification. The esti- 
mated coefficients (a, = 0.456 and a, = 0.258) are each significantly different 
from zero at the 1% level and imply characteristic roots in the unit circle. Q-sta- 
tistics indicate that the autocorrelations of the residuals are not statistically sig- 
nificant. As measured by the AIC, the fit of the AR(2) model is superior to that 
of the AR(I); the SBC is the same for the two models. Overall, the AR(2) model 
dominates the AR(1) specification. 

3. The ARMA(], 1) specification dominate the AR(2) model. The estimated coef- 
ficients are of high quality (with ¢ values of 14.9 and —4.22). The estimated 
value of a, is positive but less than unity, and the Q-statistics indicate that the 
autocorrelations of the residuals are not statistically significant. Moreover, all 
goodness-of-fit measures select the ARMA(I, 1) specification over the AR(2) 
model. Thus, there is little reason to maintain the AR(2) specification. 


Table 2.4: Estimates of the WPI (Logarithmic First Differences) f ; : . 


p= 1 p= 2 p= 1 p= l p= ] i 
J20 g=90 q=1 g=1,4 q=2 
ay 0.01) 0.011 0.012 0.011 0.012 
(4.14) (3.31) (2.63) (2.76) (2.62) 
a 0.618 0.456 0.887 0.791 0.887 
(8.54) (5.11) (14.9) (9.21) (13.2) 
a, 0.258 
(2.89) 
B, -0.484 ~0.409 -0.483 
a (~4.22) (-3.62) . (-4.19) 
By : —0.002 
' ; (-0.019) 
By 0.315 
(3.36) 
_ SSR 0.0156 0.0145 0.0141 0.0134 0.0141 
: AIC = ~-503.3 —506.1 -513.1 -518.2 -511.1 
| SBC 497.7 —497.7 -504.7 -507.0 -499.9 
= Q{12). 23.6 (0.008) 11.7 (0.302) 11.7 (0.301) 4.8 (0.898) 11.7 (0.301) 
Q24) 28.6 (0.157) 15.6 (0.833) 15.4 (0.842) 9.3 (0.991) 15.3 (0.841) 
Q30) 40.1 (0.082) 22.8 (0.742) 22.7 (0.749) 14.8 (0.972) 22.6 (0.749) 


Notes: Each coefficient is reported with the associated t-statistic for the null hypothesis that the esti- 
mated value is equal to zero, 

SSR is the sum of squared resìduals. 

Q(n) reports the Ljung-Box Q-statistic for the autocorrelations of the n residuals of the estimated 
model. With 122 observations, 7/4 is approximately equal to 30. Significance levels are in paren- 
theses. 


Shee okey ST cere me samen eee pipet sea Re NR RE SRNL iN ER E acters a 


eae 


ea 
H 


5. In contrast, the ARMA(I, 2) contains a superfluous coefficient. The t-statistic 


110 Stationary Time-Series Madels 


+. In order to account for the possibility of seasonality, we estimated the ARMA(I, 


1) model with an additional moving average coefficient at lag 4, that is, we esti- 
mated a model of the form y, = do + aipa + €& + Bre 1 + B464. More sophisti- 
cated seasonal pattems are considered in the next section. For now, note that the 
additive expression B,e,_, is often preferable to an additive autoregressive term 
of the form aay, For truly seasonal shocks, the expression B4€,4 best captures 
spikes—not decay—at the quarterly lags. The coefficients of the estimated 
ARMA[I, (1, 4)] model are all highly significant with f-statistics of 9.21, -3.62, 
and 3.36." The Q-statistics are all very low, implying that the autocorrelations of 
the residuals are statistically equal to zero. Moreover, the AIC and SBC strongly 
select this model over the ARMA(1, 1) model. 


for B, is sufficiently low that we should eliminate this model. 


Having identified and estimated a plausible model, we want to perform addi- 
tional diagnostic checks of model adequacy. Due to the high volatility in the 1970s, 
the sample was split into the two subperiods: 1960:1 to 1971:IV and 1972:1 to 
1990:IV. Model estimates for each subperiod are 


Alwpi, = 0.004 + 0.641 Alvpi,_, + €, = 0.351€, +0.172€,5  (1960:I-1971:1V) 


and 
Alwpi, = 0.016 + 0.753Alwpi,_, + €, — 0.394e,_, + 0.335¢€,_, 


The coefficients of the two models appear to be quite similar, we can formally 
test for the equality of coefficients using (2.47). Respectively, the sums of squared 
residuals for the two models are SSR, = 0.001359 and SSR, = 0.011681, and from 
Table 2.4 we can see that SSR = 0.0134. Since T = 122 and n = 4 (including the in- 
tercept means there are four estimated coefficients), (2.47) becomes 


F = {(0.0134 — 0.001359 ~ 0.01 1681)/4 |/[0.001359 + 0.011681)/(122-8)] 
= 0.78681 


With 4 degrees of freedom in the numerator and 114 in the denominator, we can- 
not reject the null of no structural change in the coefficients (i.e., we accept the hy- 
pothesis that there is no change in the structural coefficients). 

As a final check, out-of-sample forecasts were constructed for each of the two 3 
models. By using additional data through 1992:11, the variance of the out-of-sample 
forecast errors of the ARMA(], 1) and ARMALI. (1,4)] models were calculated to 
be 0.00011 and 0.00008, respectively. Clearly, all the diagnostics select the 
ARMAT[I, (1.4)] model. Although the ARMA[I. (1,4)} model appears to be ade- 
quate, other researchers might have selected a decidedly different model. Consider 
some of the alternatives listed below: 


© pendence on the weather. Similarly, the Thanksgiving-Christmas: holiday season 


some series may account for the preponderance of its total variance. Forecasts that 


Seasonality 111 


1, Trends: Although the logarithmic change of the WPI wholesale appears to be 
stationary, the ACF converges to zero rather slowly. Moreover, both the 
ARMA(], 1) and ARMA[I, (1,4)] models yield estimated values of a, (0.887 
and 0,791, respectively) that are close to unity. Some researchers might have 
chosen to model the second difference of the series. Others might have de- 
trended the data using a deterministic time trend. Chapter 4 discusses formal -` 
tests for the appropriate form of the trend. 


. The seasonality of ine uuta was modeled using a moving average term at lag 4: 
However, there are many other plausible ways to model the seasonality in the 
data, as discussed in the next section. For example, many computer programs 
are capable of estimating multiplicative seasonal coefficients. Consider the mul- 
tiplicative seasonal model: 


(1-a,L)y,= (1 + BLA + Bale, 
Here, the seasonal expression B,€,_, enters the model in a multiplicative, rather 
than a linear, fashion. Experimenting with various multiplicative seasonal coef- 
ficients might be a way to improve forecasting performance. 


. Given the volatility of the {Alwpi,} sequence during the 1970s, the assumption an 
of a constant variance might not-be appropriate. Transforming the data using a if. ; 
square root, rather than the logarithin, might be more appropriate. A general i 
class of transformations was proposed by Box and Cox (1964). Suppose that all 


values of {y,} are positive so that it is possible to construct the transformed {y*} 
sequence as 


ra 


ee eee eee 
sige ae idee o 


yF=Qe-1/A, 20 
= lny), 2=0 


The common practice is to transform the data using a preselected value of À. 
Selecting a value of À that is close to zero acts to “smooth” the sequence. As in 
the WPI example (which simply set À = 0), an ARMA model can be fit to the 
transformed data. Although some software programs have the capacity to simul- 
taneously estimate À along with the other parameters of the ARMA model, this 
approach has fallen out of fashion. Instead, it is possible to actually model the 
variance using the methods discussed in Chapter 3. 


11. SEASONALITY 


Many economic processes exhibit some form of seasonality. The agricultural, con- 
struction, and travel sectors have obvious seasonal patterns resulting from their de- 


has a pronounced influence on the retail trade. In fact, the seasonal variation of 


$ 
; 
Í 
k 


112 Stationary Time-Series Models 


ignore important seasonal patterns will have a high variance. In the last section, we 


saw how the inclusion of a four-quarter seasonal factor could help improve the . 


model of the WPI. This section expands that discussion by illustrating some of the 
techniques that can be used to identify seasonal patterns. 

Too many people fall into the trap of ignoring seasonality if they are working 
with deseasonalized or seasonally adjusted data. Suppose you collect a data set 
that the U.S. Bureau of the Census has “seasonally adjusted” using its X-11 
method.” In principle. your seasonally adjusted data should have the seasonal pat- 
tem removed. However, caution is necessary. Although a standardized procedure 
may be necessary for a government agency reporting hundreds of series, the proce- 


dure might not be best for an individual wanting to model a single series. Even if © 4 


you use seasonally adjusted data, a seasonal pattern might remain. This is particu- 
larly true if you do not use the entire span of data; the portion of the data used in 
your study can display more (or less) seasonality than the overall span. There is an- 
other important reason to be concerned about seasonality when using deseasonal- 
ized data. Implicit in any method of seasonal adjustment is a two-step procedure. 
First, the seasonality is removed, and second, the autoregressive and moving aver- 
age coefficients are estimated using Box-Jenkins techniques. As surveyed in Bell 
and Hillmer (1984), often the seasonal arid ARMA coefficients are best identified 


and estimated jointly. In such circumstances, it is wise to avoid using seasonally 
adjusted data. 


Models of Seasonal Data 


The Box—Jenkins technique for modeling seasonal data is no different from that of 
nonseasonal data. The twist introduced by seasonal data of period s is that the sea- 


sonal coefficients of the ACF and PACF appear at lags s, 2s, 35,..., rather than at 
lags 1, 2, 3,.... For example, two purely seasonal models for quarterly data might 
be 

Yi = Aaya + En [a,l <i 3 : (2.68) 
and 


y= E, + Baa £ (2.69) 


You can easily convince yourself that the theoretical correlogram for (2.68) is 
such that p; = (a,)"" if i/4 is an integer, and p, = 0 otherwise; thus, the ACF exhibits 
decay at lags 4, 8, 12,.... For model (2.69), the ACF exhibits a single spike at lag 
4 and all other correlations are zero. 

In practice, identification will be complicated by the fact that the seasonal pattern 
will interact with the nonseasonal pattern in the data. The ACF and PACF for a 
combined seasonal/nonseasonal process will reflect both elements. Note that the fi- 
nal model of the wholesale price index estimated in the last section had the form 


Y= ayy + & + Bye + Bats z (2.70) 


Seasonality 113 


Alternatively, an autoregressive coefficient at lag 4 might have been used to cap- 
ture the seasonality: 


Yi = AY + Ying + E + BEY (2.71) 


Both these methods treat the seasonal coefficients additively; an AR or MA coef- 
ficient is added at the seasonal period. Multiplicative seasonality allows for the in- 
teraction of the ARMA and seasonal effects. Consi l-7 the multiplicative specifica- 
tions: os 


Ko h Coamo o em 
(1-a, DU -— a,L*)y, = (1 + B De, :; p 273 


Equation (2.72) differs from (2.70) in that it allows the moving average term at 


Jag 1 to interact with the seasonal moving average effect at lag 4. In the same way, 


(2.73) allows the autoregressive term at lag 1 to interact with the seasonal autore- 
gressive effect at lag 4. Many researchers prefer the multiplicative form since a rich 
interaction pattern can be captured with a small number of coefficients. Rewrite 
(2.72) as 


Y= Yer + & +. Bie + Buea + Bi Bass (2.74) 


Estimating only three coefficients (i.e., a, 8, and B,) allows us to capture the ef- 
fects of an autoregressive term at lag | and the effects of moving average terms at 
lags 1, 4, and 5. Of course, you do not really get something for nothing. The esti- 
mates of the three moving average coefficients are interrelated. A researcher esti- 
mating the unconstrained model y, = a,y,., + € + B€ + Bata + Bs€,-5 would nec- 
essarily obtain a smaller residual sum of squares, since Bs is not constrained to 


` equal B,B,. However, (2.72) is clearly the more parsimonious model. If the uncon- 


strained value of B, approximates the product B, Ba, the multiplicative model will be 
preferable. For this reason, most software packages have routines capable of esti- 
mating multiplicative models. Otherwise, there are no theoretical grounds leading 
us to prefer one form of seasonality over another. As illustrated in the last section, 
experimentation and diagnostic checks are probably the best way to obtain the most 
appropriate model. 


Seasonal Differencing 


Spain is undoubtedly the most popular destination for European vacationers. 
During the months of July and August, the beaches along the Mediterranean coast 
swell with tourists basking in the sun. Figure 2.7.shows the monthly number of 
tourists visiting Spain between January 1970 and March 1989; the strong seasonal 
pattern dominates the movement in the series. You will also note that Spain's popu- 
larity has been growing; the series appears to be nonstationary in that the mean is 


_increasing over time, 


tid Stationary Time-Series Models 


Figure 2.7 Tourism in Spain. 
12 


Mill.ons 


Jan.'74 Jan. '78 


Jan. '82 Jan. ‘86 


This combination of strong seasonality and nonstationarity is often found in eco- 
nomic data. The ACF for a nonstationary seasonal process is similar to that for a 
nonstationary nonseasonal process; with seasonal data the spikes at lags s, 2s, 353524 
do not exhibit rapid decay. The other autocorrelations are dwarfed by the seasonal 
effects. Notice ACF for the Spanish tourism data shown in Figure 2.8. The autocor- 
relation coefficients at lags 12, 24, 36, and 48 are all close to unity and the seasonal 
peaks decay slowly. The coefficients at lags 6, 18, 30, and 42 are all negative since 
tourism is always low 6 months from the summer boom. 

Let y, denote the log of number of tourists visiting Spain each month, the first 


step in the Box—Jenkins method is to difference the {y,} sequence so as to make it - 


stationary. In contrast to the other series we examined, the appropriate way to dif- 
ference strongly seasonal data is at the seasonal period. Formal tests for seasonal 
differencing are examined in Chapter 4. For now, it is sufficient to note that the 
seasonal difference (1 — L'*)v, = y, — Y,-ı2 Will have a smaller variance than the first 
difference y, — y,-;. In the Spanish data, the strong seasonality means that January- 
1o-January and July-to-July changes are not as pronounced as the changes between 
June and July. Figure 2.9 shows the first ‘and twelfth differences of the data; clearly, 
the twelfth difference has less variation and should be easier to identify and esti- 
mate. 

The logarithmic twelfth difference (i.e., y, — y,-12) displays a flat ACF showing 
little tendency to decay. The first [2 of the autocorrelations are 


Seasonality 115 


Pr P2 P3 Ps Ps Ps Pz Ps Py Pio Pir Pi 
0.26 0.31 0.26 0.28 0.23 0.24 0.19 0.21 0.19 0.20 0.15 -0.17 


There is no reasonable way to fit a low-order model to the seasonally differenced 
data; the seasonal differencing did not eliminate the time-varying mean. In order to 
impart stationarity into the series, the next step is to take the first difference of the 

already seasonally differenced data. The ACF and PACF for the series (1 - L) 
(s L!?)y, are shown in Figure 2.10: the properties of this series are much more 
£ amenable to the Box—Jenkins methodology. For the first 10 coefficients, the single 
t spike in the ACF and uniform decay of the PACF suggest an MA(1) model. The 
: Significant coefficients at lags 11, 12, and 13 might result from additive or multi- 


plicative seasonal factors. The estimates of the following three models are reported 
_ in Table 2.5: 


(L-L™)L~ LU -anly = + BLE 


Model l: Autoregressive 
(1-L?)(1 = Ly, = (+ BLU + Bil), 


Model 2: Multiplicative moving 
average 
(1-L’)(1-L)y,=(1+B)L +B, Le, Model 3: Additive moving average 
The point estimates of the coefficients all imply stationarity and invertibility. 
Moreover, all are at least six standard deviations from zero. However, the diagnos- 
lic statistics all suggest that model 2 is preferred. Model 2 has the best fit in that it 
has the lowest sum of squared residuals (SSR). Moreover, the Q-statistics for lags 


Figure 2.8 Correlogram of tourism in Spain. 


1 


TET TTT TT TTT TTT TTT TTT TPT TT TTT TTT TTT] 


116 


Stationary Time-Series Models 


Figure 2.9 First and twelfth differences. 


Million; 


6 


2 f- E 
AA 
o LAL UN LANE nels ra ay P iL RÀ iliii 
os 
er ‘70 Jan. '74 Jan. '78 Jan. '82 Jan. '86 
——— First ——— Twelfth 


Table 2.5: Three Models of Spanish Tourism 


ESSE ESE SP A o 


Model 1! Model 2 Model 3 
ayy -0.408 
(-6.54) 
B -0.738 0.740 -0.640 
l (=15.56) (-16.14) (-14.75) 
B a p67 -0.306 
° (-13.92) (-7.00) 
SSR 2.823 2008 3367 
AIC 217.8 212.98 268.70 os. 
SBC 224.5 219.75 275.47 
Q02) 8.59 (0.571) 4.38 (0.928) 25.54 (0,004) 
(24) 41.11 (0.007) 15.71 (0.830) 66.58 (0.000) 
0(48) 67.91 (0.019) 37.61 (0.806) 99.31 (0.000) 


‘a b eee : ` 
Clearly, there is no difference between an additive seasonality and Waltiplicative seasonality 
when all other autoregressive coefficients are zero. 


Seasonality 117 


12, 24, and 48 indicate that the residual autocorrelations are insignificant. In con- 
trast, the residual correlations for model 1 are significant at long lags [i.e., Q(24) 
and Q(48) are significant at the 0.007 and 0.019 levels] and the residual correla- 
tions for model 3 are significant for lags 12, 24, and 48. Other diagnostic methods 
including overfitting and splitting the sample suggest that model 2 is appropriate. 
The procedures illustrated in this example of fitting a model to highly seasonal 


data are.typical of many other series. With highly seasonal data, it is necessary to 
supplement the Box-Jenkins method: 


1. In the identification stage, it.is necessary to seasonally Uli. wue data and 
check the ACF of the resultant series. Often, the seasonally differenced data will 


not be stationary. In such instances, the data may also need to be first-differ- 
enced. 

2. Use the ACF and PACF to identify potential models. Try to estimate models 
with low-order nonseasonal ARMA coefficients. Consider both additive and 


multiplicative seasonality. Allow the appropriate form of seasonality to be deter- 
mined by the various diagnostic statistics. 


A compact notation has been developed that allows for the efficient representa- 


tion of intricate models. As in previous sections, the dth difference of a series is de- 
noted by A’. For example, 


A’y, = Av, a Ymi) 
siea tia 


Figure 2.10 ACF and PACF for Spanish Tourism. 
1 7 Pao ee 


TH j E 
E Autocorrelations 


D Partial autocorrelations 


Bab i i i L Ips 
12 13 14 15 16 17.18.19 20 21 22 23 24 


Leti 
1 


ayonana 


maantee 


118 Stationary Time-Series Models Questions and Exercises 119 


A seasonal difference is denoted by A,, where s is the period of the data. The Dih ` 
such seasonal difference is A?. For example, if we wanted the second seasonal dif- 
ference of the Spanish data, we could form 


estimated modei (1) is parsimonious; (2) has coefficients that imply stationarity and 
invertibility; (3) fits the data well; (4) has residuals that approximate a white-noise 
process; (5) has coefficients that do not change over the sample period; and (6) has 
good out-of-sample forecasts. 

In utilizing the Box—Jenkins methodology, you will find yourself making many 
seemingly ad hoc choices. The most parsimonious mode! may not have the best fit 
or out-of-sample forecasts. You will find yourself addressing the following types of 


Ajay = And ~ Ye- 12) SEEN 

= AV, — ÂY : 

= Yi = Venta T (Yr12 T Vi-24) 

207 Yemi P Yr -24 

‘model more appropriate than an ARMA(1, 2) specification? How to best model 
seasonality? Given this latitude, many view tne Box—Jenkins methodology as an art 
father than a science. Nevertheless, the technique is best learned through experi- 
rence. The exercises at the end of this chapter are designed to guide you through the 
<i types of choices you will encounter in your own research. 


Combining the two topor of differencing yields A“ AM? Multiplicative models are... 
written in the form ARIMA(p, d, q)(P, D, Q), 


where pandgq = the nonseasonal ARMA coefficients 

= number of nonseasonal differences 

number of multiplicative autoregressive coefficients; 
number of seasonal differences 

= number of multiplicative moving average coefficients 
= seasonal period 


‘QUESTIONS AND EXERCISES 


SOD WA 
i 


1. In the coin-tossing example of Section 1, your winnings on the last four tosses 


Using this notation, we can say that the fitted model of Spanish tourism is an (w,) can be denoted by 


ARIMA(O, 1, 1)(0, 1, 1):2 model. In applied work, the ARIMA(O, 1, 1X0, 1, 1), - 
model occurs routinely; it is called the “airline model” ever since Box and Jenkins | 


w, = 1/4e, + 1/4e,_, + 1/4e,_, + 1/4e,_5 
(1976) used this model to analyze airline travel data. 


A. Find the expected value of w, Find the expected value given that € = 
€.= 1. 


SUMMARY AND CONCLUSIONS l 
; B. Find var(w,). Find var(w,) conditional on €,_; = €23 = 1. 
The chapter focuses on the Box—Jenkins (1976) approach to identification, estima- ; 
tion, diagnostic checking, and forecasting a univariate time series. ARMA models = 
can be viewed as a special class of linear stochastic difference equations. By defini- 
tion, an ARMA model is covariance stationary in that it has a finite and ime. 
invariant mean and covariances. For an ARMA model to be stationary, the charac 
teristic roots of the difference equation must lic inside the unit circle. Moreover, the 
process must have started infinitely far in the past or the process must aay be i 
equilibrium. ar 
In the identification stage, the series is plotted and the sample iena 
and partial correlations are examined. As illustrated using the U.S. Wholesale Price 
Index, a slowly decaying autocorrelation function suggests nonstationarity behave: 
ior. In such circumstances, Box and Jenkins recommend differencing the data.” 
Formal tests for nonstationarity are presented in Chapter 4. A common practice is: 
to use a logarithmic or Box-Cox transformation if the variance does not appear to 


C. Find: i. Cov(w,, w, ,) il. Cov(w,, w,_2) iii. Cov(w,, W,_5) 


Substitute (2.10) into y, = dg + a,y,_; + €. Show that the resulting equation is an 
identity. 


A. Find the homogencous solution to y, = dg + @,y,-. + €,. 


B. Find the particular solution given that la, | <1. 


solutions. 


ieee the second-order autoregressive process y, = ay + Q2y,9 + €,, where: 
a <l. l 


A. Find: i. E, 3y ii. E iii. E 
be constant. Chapter 3 presents some modem techniques that can be used to model TE Ye K% Sra sA ; 
the variance. iv. Cov(y,, Y)  v. Cov(y, Y2) vi. The! partial autécorrelations 
The sample autocorrelations and partial correlations of the suitably transformed | < Qi and $22 


data are compared to those of various theoretical ARMA processes. All plausible, 


sses, Al B. Find the impulse response function. Given Yin trace out the effects on an €, 
models are estimated and compared using a battery of diagnostic criteria. A well 


shock on the {y,} sequence. 


questions: What is the most appropriate data transformation? Is an ARMA(2, 1) | 


C. Show how to obtain (2.10) by combining the homogeneous and particular 


Penecerer) 


ets smaa 


perreteevned 


120 


Stationary Time-Series Models 


C. Determine the forecast function Ey, The forecast error f, is the differ- 


ence between y,,, and E,y,,,. Derive the correlogram of the (f,} sequence. : 


[Hint: Find E,f,, var(f,), and Elf, f,_,) for j = 0 to s.] 


SE s 
. Two different balls are drawn from a jar containing three balls numbered 1, 2.° 


and d. Let x = number on the first ball drawn and y = sum of the two balls- 
drawn. 


A. Find the joint probability distribution for x and y; that is, find prob(x = 1, 


v= 3), probfx=lov=5).... and proh(x = 4. y= 6). 


B. Find each of the following: E(x), EQ), Evy |x =1), Etxl y = 5), var(x|y = 
5), and E>). 


C. Consider the two functions w, = 3x? and w? = x`'. Find E(w, + w,) and 
E(w, + wy | y=3). 


’ D. How would your answers change if the balls were drawn with replacement? 


. The general solution to an nth-order difference equation requires n arbitrary 
_.. , constants. Consider the second-order equation y, = ao + 0.75y,_, ~ 0.125y,_. + 


(A 


A. Find the homogeneous and particular solutions. Discuss the shape of the 
impulse response function. 


B. Find the values of the initial conditions (and A, and A.) that ensure the {yi} 
sequence is stationary. (Vote: A, and A, are the arbitrary constants in the 
homogeneous solution.) 


C. Given your answer to part B, derive the correlogram for the {y,} sequence. © 


. Consider the second-order stochastic difference equation y, = 1.5y,_, — 0.5y,_2 +: 


Er 


A. Find the characteristic roots of the homogeneous equation. 


B. Demonstrate that the roots of I- 1.5L + 0.5L? are the reciprocals of your, - 


answer in part A. 


C. Given initial conditions for yg and y,, find the solution for y, in terms of the ` 


current and past values of the {e,]} sequence. Explain why it is not possible 


to obtain the backward-looking solution for y, unless such initial conditions 
are given. 


D. Find the forecast function for y,,,. 


E. Find: Ey, Ey, var(y,), var(y,,,). and cov(y,,), Yp- 


. The file entitled SIM_2.WK! contains the simulated data sets used in this 


chapter. The first column contains the 100 values of the simulated AR(1) 


Questions and Exercises 121 


process used in Section 7. This first series is entitled Y1. Use this series to perform 
the following tasks. (Note: Due to differences in data handling and rounding, your 
answers need only approximate those presented here.) 


A. Plot the sequence against time. Do the data appear to be stationary? Show 
that the properties of the sequence are ; 
Sampie mean —U.5707418062 


Variance ` "39987 
Skewness —0.31011 


Significance Level (Sk=0) 0.21239328 


B. Verify that the first 12 coefficients of the ACF and PACE are 


ACF: 
1: 0.7394472 0.5842742 0.4711050  0.3885974 0.3443779  0.3350913 
7: 0.2972263 0.3251532  0.2689484 0.2007989 0.1886648 00.0824283 


PACF: 
l: 0.7394472 0.0827240 0.0302925 0.0255945 0.0601115  0,0889358 
7: -0.0165339 0.1438633 —0.1002335 —0.0653566 0.0699036 —0.2040202 


Ljung-Box Q-statistics: Q(8) = 177.5774, 
O(16) = 197.8423, Q(24) = 201.2825 


: C. Use the data to verify the results given in Table 2.2. 


_ D. Determine whether it is appropriate to include a constant in the AR(1) 


process. You should obtain the following estimates: 


Standard Significance 
Coefficient Estimate. Error t-Statistic Level 
Bn a ah Bek Te ee 
: 1. CONSTANT  ~0.538045291 0.380434146 —1.41429  0.16044514 
2. AR{1} 0.756861387 0.067241069 11.25594  0.00000000 


-E. Estimate the series as an AR(2) Process without an intercept.. You should 
` obtain: 


Standard Significance 
Coefficient Estimate Error t-Statistic -Level 
1. AR{1} 0.7048671016 0.0993987373 7.09131. 0.00000000 
2. AR{2} 


0.1094585628 0.0986680252 1.10936  0.26998889 


` Ljung-Box Q-statistics: Q(8) = 5.1317, Q(16) = 15.8647, (24) = 21.0213 


F. Estimate the series as an ARMA(1, 1) process without an intercept. You 
should obtain: 


122 Stationary Time-Series Models 


Standard Significance ` l 
Coefficient Estimate Error t-Statistic Level 
1. AR{1} 0.846376753 0.068533381 12.34985 0.00000000 
2.MA{1} —0.148770547 0.125784398 —-1.18274  0.23977273 


Verify that the first 12 coefficients of the ACF and PACF of the residuals are: 


ACF: 
k -9.0N69909  —-0.9365955 ~0.0375520 -0.0749124 -0.0683620  0.0546530 
7: —0.0808082 0.1598 166 0.0732022 -0.0080406 0.1686742 -0.0484844 


PACF: 
1: -0.0069909 -0.0366462 -0.0381264 -0.0770739 -0.0733243 0.0460005 
7: -0.0923797 0.1542973 0.063068 | 0.0027253 0.1917630 ~-0.0374165 


Ljung-Box Q-statistics: Q(8) = 5.2628, significance level 0.51057476 
. O(16) = 15.7449, significance level 0.32919794 
Q(24) = 21.0950, significance level 0.51487365 ` 


G. Compare the AIC and SBC values from the models estimated in parts D, E, 
and F. 


. The second column in file entitled SIM_2.WKI contains the 100 values of the 


simulated ARMA(I, 1) process used in Section 7. This series is entitled Y2. 
Use this series to perform the following tasks. (Note: Due to differences in data 
handling and rounding, your answers need only approximate those presented 
here.) f 


A. Plot the sequence against time. Do the data appear to be stationary? Show 
that the properties of the sequence are: 


Sample mean 0.022548180000 Variance 5.743104 
Skewness —0.06i75 Significance level (Sk-=0) 0.80390523 


ACF: 
1: -0.8343833 0.5965289 +~0.4399659 0.3497724 -0.3187446 0.3316348 
7: —0.337 1782 0.3166057 -0.2761498 0.1789268 —0.0839171 0.0375968 


PACF: 


1: —0.8343833 —0.3280611 —0.1942907 ~-0.0145160 -0.1398293 0.0891764 
7: 0,0004335 0.0143663 0.0166776 —0.1987829 -0.0462213 -0.0212410 


B. Verify the results in Table 2.3. 


Questions and Exercises 133 


C. Estimate the process using a pure MA(2) model. You should obtain: 


t-Statistic 


0.087208938 ~13.21709 


5.97594 


Significance 
Level 


0.00000000 
0.00000004 


coefficients of the ACF and PACF of the residuals 


Standard 
Coefficient Estimate Error 
1. MA{i} —1.152648087 
2. MA{2} 0.521919469 0.087336869 
D. Verify that the first 12 
are 
ACF: 


l: -0.1281102  0.2841720 
7: —0.1711865  0.1009624 


| PACF: 


l: —0.1281102  0.2722277 
7: -0.1253922 —0.0203505 


Ljung~Box Q-statistics: QO(8) = 28.4771, 
Q(16) = 37.4666, 
Q(24) = 38.8424, 


. The third column in SIM_2.WK 
this series is entitled Y3. Use thi 
Due to differences in data handlin 
proximate those presented here.) 


-0.2721070 ` 0.0641308 
~0.2300744  0.0202338 


-0.2314021  -0.0521753 
—0.1278106 ~0.0870339 


A. Plot the sequence against time. Verify the ACF a 


ported in Section 7. Com 


oretical AR(2) process. 


l contains the 100 values of an A 
s series to perform the foll 


-0.1690135 
-0.0918914 


—0.0407344 
0.0170745 


0.1591088 
—0.0507396 


0.0989550 
—0. 1709188 


significance leve] 0.00007638 
significance level 0.00062675 
significance level 0.01470990 


R(2) process; 
1 owing tasks. (Note: 
g and rounding, your answers need only ap- 


nd PACF coefficients re- 


B. Estimate the series as an AR(1) process. You should find: 


pare the sample ACF and PACF to those of a the- 


Standard Signi 
gnificance 
Coefficient Estimate Error t-Statistic Level 
1, AR{1} 0.4676067905 0.0892951880 5.23664  0.00000093 

ACF of the Residuals: 

$ 0.2226399 —0.3349466 —0.3386407 0.0569540 0.0807033 —0.1656232 
: -0.1358947 0.1490039 0.1810292 ~0.0022135 ~0.0893834 —0.0245175 

PACF of the Residuals: 

1: 0.2226399 —0.4045690 —0. 1809423 0.0803672 —0. 1663664 ~0.2353309 

7: ~0.0327129 0.0578083 ~0.0587342 0.0005358 —0.038 1843 


0.0422312 


Stationary Time-Series Models 


Ljung-Box Q-statistics: O(8) = 36.9968, significance level 0.00000470 
O(16) = 55.8708, significance level 0.00000127 
O(24) = 69.0486, significance level 0.00000170 


C. Why is the AR(1) model inadequate? 


D. Could an ARMA(1, 1) process generate the type of sample ACF and PACF 
found in part A? Estimate the series us an ARMA(1, 1) process. You 


should obtain: 


Standard Significance 

Coefficient Estimate Error t-Statistic Level 
1. AR{1} 0.1861328174 0.1592235925 1.16900  0.24526729 
2.MA(1) 0.5057665581 0.1407905283 3.59233  0.00051680 


ACF of the Residuals: 


l: 0.028410] —0.1131579 -0.3143993 0.0716440 0.0162748 —0.1298382 
7: —0.1197985 00.1392267  0.1194444 0.0174992 -—0.1155456  0.0427301 


PACF of the Residuals: 


1: 0.028410] -0.1140571 -0.3118831 0.0757999 -0.0596767 —0.2396433 
7: —0.0872039  0.1041284 —0.0272326 ~0.0175071 ~0.0164607 0.0486076 


Ljung-Box Q-Statistics: Q(8) = 17.7685, significance level 0.00683766 
Q(16) = 37.0556, significance level 0.00072359 
Q(24) = 44.9569, significance level 0.00268747 


Why ts the ARMA(1, 1) model inadequate? 


E. Estimate the series as an AR(2) process to verify the results reported in the 
text. Also show that 


ACF of the Residuals: 
l: 0.0050856 0.0167033 —0.1311013 0.0737802 -0.0!83142 -0.1857531 
7: -0.1223167  0.1169804 0.0827464 -0.0445903 -0.1014803  0.0879798 


13: -0.1499004 0.036597] -0.1062701 0.2608459 -0.0365855 —0.1119749 
19: -0.0855518 0.0179101  0.0695385 -0.1661957 -0.0183144  0.0479631 


PACF of the Residuals: 
1: 0.0050856 0.0166779 -0.1313096 0.0764420 -0.0160463 -0.2098313 
7: —0.1023138 0.1265615 0.0378627 -0,0653412 -0.0679885 0.062957} 


V3. ~0.2287224  0.0563135 -0.0068239 0.2076758 -0.0936362 -0.1587757 
19: -0.0419646 —0.0410407  0.0716762 _-0.1014686 0.0384143  -0.0779761 


Questions and Exercises 125 
Ljung—Box Q-Statistics: Q(8) = 9.2697, significance level 0.15896993 
Q(16) = 24.6248, significance level 0.03845761 
Q(24) = 31.8487, significance level 0.08001287 


The Q-statistics indicate that the autocorrelations at longer lags are statistically 


different from zero at the usual significance levels. Why might you choose not to 
model such long lags when using actual economic data? 


F. Now estimate the series as an AR(2) but also include a moving average 
term at lag 16. Show that the residuals are such that 


© ACF of the Residuals: 
© I: 0.0265736  0.0040771 -0.0933018  0.0858766  0.0225622 ~0.1521287 
T. -0.1643954 0.0947202 0.1447444 =0.0017055 -0.0718022 0.051258) 


13: -0.1023376 0.0151149 -0.1029252 0.0174225 -0.0629532 -0.1078434 
19: —0.0754905 -0.0307818  0.0130560 -0.1275938 0.0223896 0.0338157 


PACF of the Residuals: 

1: 0.0265736 0.0033733 -0.0935665 0.0917077 0.0182999 ~0.1663372 
7: ~0,1432380 0.1106009 0.1204167  -0.0169905 -0.0350092 0:0517180 
13: -0.1887574 — 0.0078523 = 0.0014991 = 0.0232808 -0.0985569 ~-0.1417484 
19: -0.0753388 —-0.0797882  0.0086627 -0.1045587  0.0291697 -0.0227024 


Ljung-Box Q-statistics: Q(8) = 8.2222, significance level 0.14440657 
Q(16) = 13.9801, significance level 0.37524746 
Q(24) = 19.0856, significance level 0.57964913 


C. Compare the AIC and SBC values from the models estimated in parts B, D, 
E, and F. 


. The file called WPIL.WKI contains the U.S. Wholesale Price Index from 


1960:QI to 1992:Q2. Make the data transformations indicated in the text. 


A. Use the sample from 1960:Q1 to 1990:Q4 in order to reproduce the results 
‘of Section 10. 


B. Use the fitted model to create “out-of-sample” forecasts for the 1991:Q1 to 
1992:Q2 period. 


C. Consider some of the plausible alternative models suggested in the text. 
i. Try to fit a model to the second-difference of the logarithm of the WPI. 


ii. Estimate the multiplicative seasonal model 


D. Compare these models to that of part B. 


eee iy ur ee er ner AN BE TR A PN APE A n 


od 


126 


I1. 


Stationary Time-Series Models 


The file entitled US.WKI contains quarterly value of the U.S. money supply 


(M1) from 60:Q1 to 91:Q4. $ 


A. Plot the sequence against time. Verify that the properties of the sequence 
are a 


Sarmrte mean 3.80169890625 Vari: nce DN 5.260577E+22 
Skewness 0.83949 Significance level (Sk =0) 0.00012712 


B. Detrend the data by estimating the resression: 
A log(M1) = a, + b(time) + €, 
The ACF of the residuals is 


l: 08835022  0.8752123 0,8064355  0.8334758 0.7165115 0.696831 


7: 0.6249026 0.6437679 0.5285896 0.511888] 0.4507793  0.4770092 ` 


Ljung-Box Q-statistics: Q(8) = 630.0809, significance level 0.000 
O(16) = 836.4612, significance level 0.000 


Does detrending seem to render the sequence stationary? 


C. Calculate the ACF and PACF of the first difference of log(M1). You should 
obtain: 


ACF: 


1: 05394848 0.323478! -0.5573607 —0.8528067  —0.5168406 —0.2986240 F: 


7: -0.5523817  0.7950047 -0.5096188 0.2695013 -—0.5425407  0.7549618 


PACF: 
1: —0.5394848  0.0457493 —0.5175494 00.7167389 —0.0356317 —-0.1396979 


7: -0.0457462  0.1998479 ~-0.0995162 —0.1475262 —0.0125845  0.0905883 3 


Explain the observed pattern at lags 4, 8. and 12. 


D. Seasonally difference the money supply as 4, log(M1) = A log(M1), 
~ A log(M1),_,. You should find that the ACF and PACF are 


ACF: 
1: 0.8585325 0.7148654 0.5452426 0.3963377 0.3401345: 0.2636718 
7: 0.1814409 0.099 1204 0.0554050 0.0287039 0.0423198  0.0651970 


PACF: 
l: 0.8585325 —0.0844838 -0.1831526 —0.0283342 0.2688532 —0.1594976 


7: —0.1789985 —0.0055668 0.2312324 -0.0787959 -0.0015501 0.0736405 ` 


E. For convenience, let ml, denote A, log(M1). Estimate the seasonally differ- 
enced log of the money supply as the AR(1) process: 


ml, = ao + ayml,_, + & 


s Standard C Significa 
Coefficient . Estimate Error t-Statistic Teas 
CONSTANT 0.06217 0.0090502490 6.86967 0.000000 
AR{1} ~ 0.86241 0.044662283110 3N07N OC E SES 


2 PNAS 


Examine the diagnostic statistics to show that this model is inappropriate 


F. Estimate log(M1) using each of the following: 


ARIMA(I, 0, 0)(0, 1, 1) 
ARIMA{1, 0, (4)](0, 1, 0) 


Why is each inadequate? 


G. Define Aml, = ml, — ml,_, so that Ami, is the first difference of t 
difference of the money supply. Estimate Ami, as 


Aml, = (1 +B Le, 2 E 


You should obtain: 


, Standard Si nificance 
Coefficient Estimate Error t-Statistic eae 
MA{4} . ~0.672328387 0.071121156 -9.45328 . 0,00000000 


ACF of Residuals: 


l: 0.0616653 0.1387445  -0.0388472  0.0720538 0.0875724 


0.01 10692 
7: -0.0622441 —0.0953258 -0.0131446 -0.1265891 


—0.0802878 —0.0407282 
PACF of Residuals: 


l; 0.0616653 0.1354570 —0.0558297 0.0601665 0.0952727  -0.0207820 

T: —0.0826424 -0.0831404 00.0052625 -0.1232642 —0.0717116  0.0263945 

Ljung-Box Q-statistics: Q(8)= 6.5331, significance level 0.479 
Q(16) = 10.3813, significance level 0.795 
Q(24) = 14.0666, ‘Significance level 0:925 
Q(32) = 17.4491, - significance level 0.976 


: Explain why this model is superior to any of those in part F. 


Questions and Exercises 127 


he seasonal.. 


PARIEN, 
CBee Re 


rere 


wman aiaa 


128 Stationary Time-Series Models j 
ENDNOTES 


1. The appendix to this chapter provides a review of constructing joint probabilities, ex- 
pected values, and variances. 

2. Some authors let T equal the maximum number of observations that can be used in the es- 
umation; hence, T changes with the number of parameters estimated. Since there is no un- 
derlying distributional theory associated with the AIC and SBC, this procedure canno be 
said to be incorrect. Also be aware that there are several equivalent formulations of the 
AIC and SBC. Your software package may not yield the precise numbers reported in wie 
text. 

3. New, an econometric software packages cunei a Box—Jenkins eoumation procedure. 
Mechanics of the estimation usually entail nothing more than specifying the number of 
autoregressive and moving average coefficients to include in the estimated model. 

4. Most software programs will not be able to estimate (2.43) since there is not a unique set 
of parameter values that minimizes the likelihood function. 

5. Some software programs report the Durbin-W atson test statistic as a check for first-order 
serial correlation. This well-known test statistic is biased toward finding no serial correla- 


tion in the presence of lagged dependent variables. Hence, it is ususally not used in - 


ARMA models. ; 
6. Estimation of an AR(p) model usually entails a loss of the number of usable observa- 
tions. Hence, to estimate a sample using T observations, it will be necessary to have (T + 
p) observations. Also note that the procedure outlined necessitates that the second sub- 
sample period incorporate the lagged values tms tm-is +++ s fmepel 
7. Many of the details concerning optimal forecasts are contained in the appendix to Chapter 
3. 


8. In essence, the estimated equation is an ARMA(I, 4) model with the coefficients B, and 
B constrained to be equal to zero. In order to distinguish between the two specifications, 
the notation ARMAf!, (1,4)] is used to indicate that only the moving average terms at 
lags | and 4 are included in the model. 

9. The details of the X-11 procedure are not important for our purposes. The SAS statistical 
package can preform the X-11 procedure. The technical details of the procedure are ex- 
plained in the Bureau of the Census report (1969). 


APPENDIX Expected Values and Variance 


1. Expected value of a discrete random variable , 
A random variable x is defined to be discrete if the range of x is countable. If x 
is discrete, there is a finite set of numbers -x,, %2,..., x, such that x takes on 
values only in that set. Let f(x) = the probability that x =x, The mean or ex- 
pected value of x is defined to be 


E(x)= x, f(x) 


j=! 


Note the following: 


3; 


Appendix 129 


F We can let n go to infinity; the notion of a discrete variable is that the set be 
“denumerable or a countable infinity. For example, the set of 
integers is discrete. 

: If E x,f(x) does not converge, the mean is said not to exist. 


; E(x) is an “average” of the possible values of x; in the sum, e 
is weighted by the probability that x = X; that is, 


all positive 


ach possible .x, 


E(x) = yy tyr, fet Wik: jong atime! 


where Ewj=:! 


a 


. Expected value of a continuous random variable 


Now let x be a continuous random variable. Deno 
the interval (xp, x;) be denoted by f(t S 
by Figure A2.1, it follows that 


te the probability that x is in 
x Sx). If the function f(x) is depicted 


S(t SxSx)= | f(x) de leg 


Xo 


The mean, or expected value, of x is ee 


°° 


E(x) = frw dx 


Expected value of a function ) 


be a random variable and g(x) a function. The mean or expected value of 
g(x) is 


E= Ý g(x) )f(x;) 


j=l 


Figure A2.} Frequency of x. 
J - 


ACI ERD Sear renea Prope ee AnA a e- 


i 
i 
\ 
i 
' 
t 


Poon 
i 
a 
i 


130 Stationary Time-Series Models 


for discrete x or 


Eeo) = fe fee dx 
for continuous x. Nore: If g(x) = x, we obtain tag simple mean. 


4. Properties of the expectations operator 


. The expected value of a constant c is the value of the constant. That is, 
Elco) =c. 


Proof: 


œ 


ga =e fof) dv=e] fœ dx=c 


xo es co 


2. The expected value of a constant times a function is the constant times the 
expected value of the function. : 


em eben’ 


Proof: 


Beg oi= | egf) de=e | fo dx = cla) 


~oo 


3. The expected value of a sum is the suim of the expectations: 
Ele.gi(x) E cagA] = c Eg) = cE g2(x) 
Proof: 


Jlesilot ergs) dx = fesiCofeo dx fes@se dx 


co 
=% 


= c Eg O] E c2ELe2()] 


5. Variance of a Random Variable ; 
The variance of x is defined such that var(x) = E{ [x ~ EW}: 


Var(x) = Ele — 2x E(x) + Elx) E(x)] 


Appendix 131 


Since E(x) is a constant, E[E(x)] = E(x) and E[xE(x)] = [EWF Using these 
results and the property that the expectation of a sum is the sum of the expecta- 
tions, we obtain 


Var(x) = EQ) - 2E[x E] + E 
= E(x?) = [Ew 


. Jointly Distributed Discrete Random Variables 


Let x and y be random variables m vthat x takes on values x, Xa.. 


vahie wu v vo Alen Tat ie 
vy 


x, and y 


ao te the probability that x = x; and y = y,. If 
g(x, y) denotes a function of x ands) y, the expected value of the function is 


Elgtx, m= fal. 9) 
i=] js} 


Expected value of a sum 
Let the function g(x, y) be x + y. The expected value of x + yis: 


E(x+y)= $ 9 +y) 

i j 
3? = 40+ fy, 
era i fj REF 


=$ Gat hyet we + fjkn) t > a + Siaa + tSn 
j i 


Note that (fir + fio + fis ++ + fim) iS the probability that x takes on the value 
x, denoted by fı. More generally, (fa + fi2 + fiz ++ + fim) is the probability 
that x takes on the value x, denoted by f; or f(x). Since (fy + fo; + fap to + 


fni) is the probability that y = y; denoted by f(y,), the two summations above . 


can be written as 


E(x + y) = Exif (x) + Lyf) 
= E(x) + EQ) 


Hence, we have generalized the result of 4.3 above to show that the expected | 


value of a sum is the sum of the expectations. 


. Covariance and Correlation 


The covariance between x and y, cov(x, y), is defined to be 


Cov(x, y) = E{{x- EQ] - EQ))} = 


Br 
a: 
ži 


PERSERI 


CET E 


simaa 


EEDE 


aa 


132 Stationary Time-Series Models 


Multiply [x ~ E(x] by [y — E(y)] and use the property that the expected 
value of a sum is the sum of the expectations: 


Cov(x, y} = E(xy) -— EKE) - EWE] + ELEWE) 
= E(xy) =- EWEN) 


The correlation coefficient between x and y is defined to be 


x cov(x, y) 


Vvar(x)y ya 


Since cov(x, y) = Ey) — EQE(y), we can xpress the expectation of the prod- 
uct of x and y, E(xy), as 


Elxy) = EWEG) 4 cova, y) 
= E(X)E() + Pp 5,9, 


where the standard deviation of variable z (Jenoted by o,) = the positive square 
root of z. 


8. Conditional Expectation 
Let x and y be jointly distributed random variables, where fy denotes the prob- 


ability that x = x; and y = y,. Each of the fy values is a conditional probability; ` 


each is the probability that x takes on the value ~, given that y takes on the spe- 
cific value y, 
The expected value of x conditional on y taking on the value yis: 
Ely) = fux fytit eet fx 


nin 


9. Statistical Independence 


If x and y are statistically independent, the probability of x = x; and y = yis. 


the probability that x = x; multiplied by the probability that y = y;. If we use the 
notation in number 6 above, rwo events are statistically independent if and only 
if fy = f(x) f0,). For example, if we simultaneously toss a fair coin and roll a 
fair die, the probability of obtaining a head and a three is 1/12; the probability 
of a head is 1/2 and the probability of obtaining a three is 1/6. 

An extremely important implication follows directly from this definition. If 
x and y are independent events, the expected value of the product of the out- 
comes is the product of the expected outcomes: 


Ely) = EQ)EQ) 


The proof is straightforward. Form E(x) as 


Appendix 133 


Etxy) = fixy; + fiy t+ fixy +o + FineX1¥m + faxy, + faery, 
+ foaka Ya +o + faa Vog H + FniXndi + frrrnd2 + fastays tf 


nma y m 


Since x and y are independent, fi = fdf). Hence 
E(xy)= Y fax Tees +> Smti 
i=l i=l 


288) fUnfoven+ > fonfuasont 45 A 
icl m i7m 


fel i=l 


i=] 


= SOW DSO + +L Onn Y Sled 
i=] 


Recall that Lf (xx; = Elx). Thus 


EGY) = EASO Y + fly)y, + + F(Y Ym] 


so that E(xy) = EEY). Since cov(x, y) = E(xy) - 
` follows that the covariance and correlation 
_ events is zero. 


E(x)E(y), it immediately 
coefficient. of two independent 


10. An Example of Conditional Expectation 


‘Since the concept of conditional expectation plays such an important role ii 
modern macroeconomics, it is worthwhile to consider the specific example of 
tossing dice. Let x denote the number of spots showing on die 1, y the number 

.Of spots on die 2, and § the sum of the spots (S =x + y). Each die is fair so that 
the probability of any face turning up is 1/6. Since the outcomes on die | and 
die 2 are independent events, thé probability of any specific values for x and y 


is the product of the probabilities. The possible outcomes and the probability 
associated with each outcome S are . 


S 2 3 4 5 6 7 8 


9 10 Ho 12 
F(S) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 


To find the expected value of the su 
the probability associated with that o 
been to Las Vegas, the expected valu 


m S, multiply each possible outcome by 
utcome. As you well know if you have 
e is 7. Suppose-that you roll the dice se- 


334 Stationary Time-Series Models 


a probability of 1/6. Given x = 3, the possible outcomes for S are 4 through 9, 
each with a probability of 1/6. Hence, the conditional probability of S given 

R three spots on die 1 is E(S|x = 3) = (1/6)4 + (1/6)5 + (1/6)6 + (1/6)7 + (1/6)8 + 
(1/6)9 = 6.5. 


; q Chapter 3 


MODELING ECONOMIC 
TIME SERIES: TRENDS AND 
-~ VOLATILITY 


l Many economic time series do not have a constant mean and most exhibit phases 
of relative tranquility followed by periods of high volatility. Much of the current 


_ analyze this type of time-series behavior. The aims of this chapter are to: 


1, Examine the so-called stylized facts concerning the properties of economic time- 
series data. Casual inspection of GNP, financial aggregates, interest and ex- 
change rates suggests they do not have a constant mean and variance. A stochas- 
tic variable with a constant variance is called homoskedastic as opposed to 
heteroskedastic.' For series exhibiting volatility, the unconditional variance 
may be constant even though the variance during some periods is unusually 
large. You will learn how to use the tools developed in Chapter 2 to model such 
conditional heteroskedasticity. 


. Formalize simple models of variables with a time-dependent mean. Certainly, the 
mean value of GNP, various price indices, and the money supply have been in- 
creasing over time. The trends displayed by these variables may contain determin- 
istic and/or stochastic components, Learning about the properties of the two types 
of trends is important. It makes a great deal of difference if a series is estimated 
and forecasted under the hypothesis of a deterministic versus stochastic trend. 


. Illustrate the difference between stochastic and deterministic trends by consider- 


ing the modern vicw of the business cycle. A methodology that can be used to 
decompose a series into its temporary and permanent components Is presented. 


1. ECONOMIC TIME SERIES: THE STYLIZED FACTS. 


: Figures 3.1 through 3.8 illustrate the behavior of some of the more important vari- 
ables encountered in macroeconomic analysis. Casual inspection docs have its per- 
ils and formal testing is necessary to substantiate any first impressions. However, 


: econometric research is concerned with extending the Box—Jenkins methodology lo - 


oi RUA ER CT era L T 


finder 0S 
3 


R A 


136 


Modeling Economic Time Series: Trends and Volatility 


Figure 3.6 US. GNP (1985 prices). 
oy EET mE ET i RTL TTT TTT TTT ATTA 
4j- 
n 
Cc 
8 3|- 
= 
2 En 
1 TETIVTUTOTECUTOVTTUTYOUTOTURLTOOOUCUTOQOUCCOOTTTOUCTTEET TT 
1960 1966 1972 1978 1984 1990 


the strong visual pattern is that these series are nor stationary; the sample means do 


tween + 


sie ees Srey 


not appear to be constant and/or there is the strong appearance of heteroskedastic- 
sa ity. 
facts”: 


We can characterize the key features of the various scries with these “stylized 


. Most of the series contain a clear trend. Real GNP and its subcomponents and 


the supplies of short-term financial instruments exhibit a decidedly upward 
trend. For some serics (interest, and inflation rates), the positive trend is “Tnter- 
rupted by a marked decline, followed by a resumption of the positive growth, 
Nevertheless, it is hard to maintain that these series do have a time-invariant 
mean. As such, they are not stationary. 


Figure 3.2 
1000 


Investment and government consumption (1985 prices). 


— Government Consumption 


800 |- > Investment 


600 


Billions 


0 tel! 
a 1972 


1960 


1966 1978 1984 1990 


Billions of $ _ 


2. 


w 


Economic Time Series: The Stylized Facts 137 


Figure 3.3 Checkutile Ai and money marke! instruments. 


800 


1965 1967 1969 1971 1973 1978 1977 1979 1981 1983 1985 1987 1 1989 1991 


— Checkable deposits —— Money market instruments 


Some series seem to meander. The pound/dollar exchange rate shows no par- 
licular tendency to increase or decrease. The pound seems to.go through sus- 
tained periods of appreciation and then depreciation with no tendency to revert 


to a long-run mean. This type of “random walk” behavior is typical of nonsta-: 
tionary series. 


Any shock to a series displays a high degree of persistence, Notice that the 
Federal Funds Rate experienced a violently upward surge in 1973 and remained 
at the higher level for nearly 2 years. In the same way, U.K. industrial produc- 


tion plummeted in the late 1970s, not returning to its previous level until the 
mid-1980s. 


. The volatility of many series is not constant over time. During the 1970s, 
: U.S. producer prices fluctuated wildly as compared with the 1960s and 1980s. 
Real investment grew smoothly throughout most of the 1960s, but became 
‘highly variable in the 1970s also. Such series are called conditionally het- 
: eroskedastic if the unconditional (or long-run) variance is constant but there are 


periods in which the variance is relatively high. 


. Some series share comovements with other series. Large shocks to U.S. in- 


dustrial production appear to be timed similarly to those in the U.K. and Canada. 
Short- and long-term interest rates track cach other quite closely. The presence 
of such comovements should not be too surprising. We might expect that the un- 
derlying economic forces affecting U.S. industry also affect industry internation- 


- ally. 


Ne rere 


wer eee 


iy 


me it 


Trillions of S$ 


$38 Modeling Economic Time Series: Trends and Volatility 


Figure 3.4 U.S. money supply: M2. 


Please be aware that “cyeballing” the data is not a substitute for formally testing 
for the presence of conditional hetcroskedasticity or nonstationary behavior.” 
Although most of the variables shown in the figures are probably nonstationary, the 
issue will not always be so obvious. Fortunately. it is possible to modify the tools 
developed in the last chapter to help in the identification and estimation of such se- 
ries. The remainder of this chapter considers the issuc of conditional heteroskedas 
ticity and presents simple models of trending variables. Formal tests for the pres- 


Figure 3.5 Short- and long-term U.S, interest rates. 


20% TTT ATTANASIO TTT TTT TITIAN TT TTP OTTTITTY | 


15% 


10% 


5% | 


1960 1966 1972 1978 1984 1990 
--— Federal funds —— 10-year bond 


ARCH Processes 139 


Figure 3.6 U.S. price indices (percent change). 


40% Tp TTT E 


Annual rates 


0% 


-10% 


1960 1966 1972 1978 1984 1990 
--— GNP deflator —— Producer prices 


chapter. The issue of comovements must wait until Chapter 6. 


2. ARCH PROCESSES 


sumed to be constant. However, Figures 3.1 through 3.8 demonstrate that many 
economic time series exhibit periods of unusually large volatility followed by peri- 
ods of relative tranquility. In such circumstances, the assumption of a constant vari- 
ance (homoskedasticity) is inappropriate. It is easy to imagine instances in which 
you might want to forecast the conditional variance of a series. As an asset holder, 


_ you would be interested in forecasts of the rate of return and its variance over the 
holding period. The unconditional variance (i.e., the long-run forecast of the var- . 


ance) would be unimportant if you plan to buy the asset at f and sell at r+ 1. 
One approach to forecasting the variance is to explicitly introduce an indepen- 
dent variable that helps to predict the volatility. Consider the simplest case in which 


Yii = En 1 


where y,,, = the variable of interest 


€,,, = a white-noise disturbance term with variance. g? | 
: x, = an independent variable that can be observed at period t 
x = Xa =X. = = constant, the {y,} sequence is the familiar. white-noise 


‘process with a constant variance. However, when the realizations of the {x,} se- 


ence of trends (either deterministic and/or stochastic) are contained in the next 


In conventional econometric models, the variance of the disturbance term is as- - 


pee ER ECVE 


eas Py tre tee err e 


+ 
i 
t 
t 
} 
į 


peepee) 


Ratessices eee [orreee] Pett atal 


aaaeaii 


wus 


poreerrar sy 


q 


~~ 


a 


140 Modeling Economic Time Ser:es: Trends and Volatility 


Figure 3,7 Exchange rate indices (currency/dollar). 
2.5 


1971 = 1.00 


EET 1973 1975 1977 1979 1981 1983 1985 1987 1989 


—— U.K. ©Canada à Japan 


‘ 
% 


quence are not all equal, the variance of y,,, conde oa the observable value of 
x, is 


tht sel 


Vary |x) = 20? 

Here, the conditional variance of y,,, is dependent on the realized value of x, Since 
you can observe x, at time period f, you can form the variance of y,,, conditionally 
on the realized value of x. If the magnitude (1,)” is large (small), the variance of 
Jia Will be large (small) as well. Furthermore, if the successive values of {x,} ex- 
hibit positive serial correlation (so that a large value of x, tends to be followed by a 
large value of x,,,), the conditional variance of the {y,} sequence will exhibit posi- 
tive serial correlation as well. In this way, the introduction of the {x,} sequence can 
explain periods of volatility in the {y,} sequence. In practice, you might want to 
modify the basic model by introducing the coefficients ay and a, and estimating the 
regression equation in logarithinic form as 


Iny) = do + ay InQ,_)) + e, 


where e, = the error term [formally, e, = In(e,)} 


The procedure is simple to implement since the logarithmic transformation re- 
sults in a lincar regression equation; OLS can be uscd to estimate ay and a, directly. 
A major difficulty with this strategy is that it assumes a specific cause for the 
changing variance. Often, you may not have a firm theoretical reason for selecting 
one candidate for the {x,} sequence over other reasonable choices. Was it the oil 


ARCH Processes 141 


the Bretton—-Woods system that was responsible for the volatile WPI during the 
1970s? Moreover, the technique necessitates a transformation of the data such that 
the resulting series has a constant variance. In the example at hand, the {e,} sc- 
quence is assumed to have a constant variance. If this assumption is violated, some 
other transformation of the data is necessary. 


ARCH Processes wi x 

Instead of using ad hoc variable choices for x, and/or data transformations, Engle 
(1982) shows that it is possible to simultancously model the mean and variance of a 
series. As a preliminary step to understanding Engle’s methodology, note that con- 
ditional forecasts are vastly superior to unconditional forecasts. To elaborate, sup- 
pose you estimate the stationary ARMA model y, = ay + @,y,-; + €, and want to 
forecast y,,,. The conditional forecast of y,,, is: 


E Yni = Ag + AY, 


If we use this conditional mean to forecast y,,,, the forecast error variance is 

ElQr1 — Qo ~ 41y,)"] = E} = 0°. Instead, if unconditional forecasts are used, the 
aal forecast is always the long-run mean of the {y,} sequence that is 
equal to ay/(1 ~ a,). The unconditional forecast error variance is 


E Dar =a = a) ?) = Elen + ae, + dEi + AEn + )* 
= O7/(1 - a?) 


Figure 3.8 Industrial production. 
140 
120 |- 


100 |= 


1985 = 100 


price shocks, a change in the conduct of monetary policy, and/or the breakdown of 


Š 
$ 
: 


142 Modeling Economic Time Series: Trends and Volatility 


Since 1/1 ~ aj) > 1, the unconditional forecast has a greater variance than the 
conditional forecast. Thus, conditional forecasts (since they take into account the 
known current and past realizations of series) are preferable. 

Similarly, if the variance of {e,} is not constant, you can estimate any tendency 
for sustained movements in the variance using an ARMA model. For example, let 


{é,} denote the estimated residuals from the model y, = ay + @,y,-1 + € so that the 


conditional variance of y,,, is 


Varya ly) =. = EO. ThT ayy,) ] 
= Be}, 


Thus far, we have set E,e?,, equal to o°. Now suppose that the conditional vari- 
ance is not constant. One simple strategy is to model the conditional variance as an 
AR(q) process using the square of the estimated residuals: 


E= Oy + OE? + êa + + OED, + (3.1) 


where v, = a white-noise process 


If the values of Œj, Q2,..., @, all equal zero, the estimated variance is simply 
the constant Œp. Otherwise, the conditional variance of y, evolves according to the 
auto: ‘pressive process given by (3.1). As such, you can use (3.1) to forecast the 
conditional variance at f+ 1 as 


La a2 oa 
Een =O, + 0,87 +a; tHe + AEn oe 


For this reason, an equation like (3.1) is called an autoregressive conditional | 
heteroskedastic (ARCH) model. There are many possible applications for ARCH | | 
models since the residuals in (3.1) can come from an autoregression, an ARMA: 


model, or a standard regression model. 


In actuality, the lincar specification of (3.1) is not the most convenient. The rea-‘ < 


ARCH Processes 143 


Ee, = Ely, (ay + oe; ,)'"| 


= Ev, EO, + O€2.,)? = aire. 
Since Ev,y,_, = 0, it also follows that 
Eee; = 9, i#Q (3.4) 


The derivation of the unconditional variance of e, is also Sten EAI, Square 
eand take the unconditional expectation to form : 


Le = Ely} (dy + 0,€7.1)] 
= EW Ely + Oye) 


Since o? = | and the taconditional variance of €, is identies w that of @., (i.e. 
Ee? = Ee?_,), the unconditional variance is 


= y] - o,) (3.5) 
Thus, the unconditional mean and variance are unaffected by the presence of the 
error process given by (3.2). Similarly, it is easy to show that the conditional mean 


of €, is equal to zero. Given that v, and €,_, are independent and Fy, = 0, the condi- 
tional mean of e, is 


Ele, | Et) €z- +) = Ev Ela, + e)s 0 


At this point, you might be thinking that the properties of the {e,} sequence are 


:hot affected by (3.2) since the mean is zero, the variance is constant, and all autoco- 
i variances are zero. However, the influence of (3.2) falls entirely on the conditional 


į variance. Since o} = 1, the variance of e, conditioned on the past history of € 


tis EnD 


a Seok NEM n T EEES 


son is that the model for {y,} and the conditional variance are best estimated simul- 
tancously using maximum likelihood techniques. Instead of the specification given 
by (3.1), it is more tractable to specify v, as a multiplicative disturbance. 

The simplest example from the class of multiplicative conditionally hetero- 
skedastic models proposed by Engle (1982) is 


e = Vv Noy + ye, p 3.2) 


2 2 : 
Ete; | Eis Enz e) = Ay + Hel, (3.6) 


In (3.6), the conditional variance of e, is dependent on the realized valuc of e? 
If the realized value of e}, is large, the conditional variance in ¢ will be large as 
well. In (3.6) the conditional variance follows a first-order autoregressive process 
denoted by ARCH(1). As opposed to a usual autoregression, the coefficients €g and 
= Q, have to be restricted. In order to ensure that the conditional variance is never 
Negative, it is necessary to assume that both Q and œ; are positive. After all, if a, 
is negative, a sufficiently small realization of €,., will mean that (3.6) is negative, 
Similarly, if a, is negative, a sufficiently large realization of €,_, can render a nega- 
` tive value for the conditional variance. Morcover, to ensure the stability of the au- 
toregressive process, it is necessary to restrict Q) such thatO <a, <h. 


where y, = white-noise process such that 62 = 1, v, and €, are independent of cach 
other, and Oy and &, are constants such that Gy > 0 and 0<a,< 1. 

Consider the properties of the {€,] sequence. Since v, is white-noise and indepen- 
dent of €, ,, it is easy to show that the elements of the {€,} sequence have a mean of 3 
zero and are uncorrelated. The proof is straightforward. Take the unconditional ex- - 
pectation of €,. Since Ev, = 0, it follows that 


rad 


PASTE S 


$ 
£ 
i 
$ 
F 
E 
3 


ld Modeling Economic Time Series: Trends and Volatility 


ARCH Processes 145 


Equations (3.3), (3.4), (3.5), and (3.6) illustrate the essential features of any 
ARCH process. In an ARCH model, the error structure is such that the conditional 
and unconditional means are equal to zero. Moreover, the {€,} sequence is serially 
uncorrelated since for all s # 0, Eee, = 0. The key point is that the errors are not 
independent since they are related through their second moment (recall that correla- 
tion is a linear relationship). The conditional variance itself is an autoregressive 
process resulting in conditionally heteroskedastic errors. When the realized value of 
€, ts far from zero—so that ,(€,_,)” is relatively large—the variance of €, will 
tend to be large. As you will sce momentarily, the conditional hetcroskedasticity in 
{e,} will result in {y,} being an ARCH process. Thus, the ARCH model is able to 
capture periods of tranquility and volatility in the {y,} series. E 

The four graphs of Figure 3.9 depict two different ARCH models. The upper- ; 
left-hand graph (a), representing the {v,} sequence, shows 100 serially uncorrelated ` 
and normally distributed random deviates. rom casual inspection, the {v,} se~; 


quence appears to fluctuate around a mean of zero and have a constant variance. 
Note the moderate increase in volatility between periods 50 and 60. Given the ini- 
tial condition €, = 0, these realizations of the {y,} sequence were used to construct 
the next 100 values of the (€,} sequence using equation (3.2) and setting O» = 1 and 
a, = 0.8. As illustrated in the upper-right-hand graph (b), the {€,} sequence also has 
a mean of zero, but the variance appears to experience an increase in volatility 
around t= 50. 
How does the error structure affect the {y,} sequence? Clearly, if the autorcercs- 
sive parameter a, is zero, y, is nothing more than €, Thus, the upper-right-hand 
graph can be used to depict the time path of the {y,} sequence for the case of a, = 0. 
The lower two graphs (c) and (d) show the behavior of the {y,} sequence for the 
cases of a, = 0.2 and 0.9, respectively. The essential point to note is that the ARCH 
error structure and autocorrelation parameters of the {y,} process interact with:each 
other. Comparing the lower two graphs illustrates that the volatility of {y,} is in- 
creasing in œ and a). The explanation is intuitive. Any unusually large (in absolute 
value) shock in v, will be associated with a persistently large variance in the (¢,} sc- 


Figure 3.9 Simulated ARCH processes. 


White noise process v, 


quence; the larger œ, the longer the persistence. Moreover, the greater the autore- 
gressive parameter a,, the more persistent any given change in y, The stronger the 
tendency for {y,} to remain away from its mean, the greater the variance. 


ie - E l iTo formally examine the properties of the {y,} sequence, the conditional mean 
and variance are given by 
Evy, = do + @Y 4 
v i ba and 
Var(y, ly. Yas DF EmO, ~ ao~ ay)” 

= E, (Ey 
8 l -8 ' = Oy + AE)? 
730 50 100 0 50 100 =O + a, (E1) 

(a) (b) 


Since œ and ež, cannot be negative, the minimum value for the conditional vari- 

ance is Œo. For any nonzero realization of €,_,, the conditional variance of y, is posi- 
- lively related to œ}. The unconditional mean and variance of y, can be obtained by 
solving the difference equation for y, and then taking expectations. If the process 


began sufficiently far in the past (so that the arbitrary constant A can safely be ig- - 
nored), the solution for y, is 


20 T 20 a pee 


i Y = ayl(1—a,)+ Ý aien O BD 
i=0 i 
50t -20 caa aaas Since Ee, = 0 for all z, the unconditional expectation of (3.7) is Ey, = ag (1 - a). 
T 50 100 0 50 100 : ,. The unconditional variance can be obtained in a similar fashion using (3.7). Given 
to) ' (d) 


that Ee,€,_, is zero for all-¢ # 0, the unconditional variance of y, follows directly 


Je EE Fe 


146 Modeling Economie Time Series: Trends and Vutatility. 
from (3.7) as 
Var(y,) = ya" var(é,_;) 


i=0 gi 
I 


From the result that the unconditional variance of €, is constant [i.e., var(e,) =; ` 


var(e,_)) = var(€,2) = = =a /(1 — @)), it follows that 


Var(y,) = [&/(1 ~ aD] [1/0 - a?) 


Clearly, the variance of the {y,} sequence is increasing in both a, and the ab- 
solute value of a,. Although the algebra can be a bit tedious, the essential point is: 
that the ARCH crror process can be used to model periods of volatility within the: 
univariate framework. 


The ARCH process given by (3.2) has been extended in several interesting ways. 
Engle’s (1982) original contribution considered the entire class of higher-order 


ARCH(q) processes: 
== : 
1=Y, [Og t > Oe; (3.8) 
is} 


In (3.8), all shocks from e,_, to €, have a direct effect on €, so that the condi- 
tional variance acts like an autoregressive process of order q. Question 2 at theend 
of this chapter asks you to demonstrate that the forecasts for E,¢?,, arising from 
(3.1) and (3.8) have precisely the same form. 


The GARCH Model 


Bollerslev (1986) extended Engle’s original work by developing a technique that 
allows the conditional variance to be an ARMA process. Now let the error process 
be such that 


sap ERIpE eas e= vyh, 


where œ=} 


and 


. 4 p 9 
h, = Oy + Saye? + S Bilt: . : (3.9) 
fl i=] K 


Since {v,} is a white-noise process that is independent of past realizations of Enp 


the conditional and unconditional means of e, arc equal to zero. By taking the ex- 


: gestive of such a process. The technique to constru 
residuals is as follows: 


ARCH Processes 447 


pected value of e, it is easy to verify that 
Ee,=EvNh, =0 


The important point is that the conditional variance of e, is given by Eme = h, 
Thus, the conditional variance of €, is given by h, in (3.9). 

i This generalized ARCH(p, q) model—called GARCH(p, q)—allows for both 
autoregressive and moving average cor 
we set p = 0 and q = |, itis clear that the first-order ARCH model given by (3.2) is 
simply a GARCH(O, 1) model. If all the B; equal zero, the GARCH(p, q) model is 
equivalent to an ARCH(q) model. The benefits of the GARCH model should be 


clear; a high-order ARCH model may have a more parsimonious GARCH repre- 
sentation that is much easier to identify an 


all coefficients in (3.9) must be positive. 
variance is finite, all characteristic roots 
Clearly, the more parsimonious model will entail fewer coefficient restrictions? 
The key feature of GARCH models is that the conditional variance of the distur- 
bances of the {y,} sequence constitutes an ARMA process. Hence, it is to be ex- 
pected that the residuals from a fitted ARMA model should display this characteris- 
tic pattern. To explain, suppose you estimate {y,} as an ARMA process. If your 
model of {y,} is adequate, the ACF and PACF of the residuals should be indicative 


of a white-noise process. However, the ACF of the Squared residuals can help iden- 


tify the order of the GARCH process. Since E,_,€, = A, it is Possible to rewrite (3.9) 
as 


Moreover, to ensure that the conditional 
of (3.9) must lie inside the unit circle. 


4 p = ads 
E, 46; = Og + Moe, + S Bikes (3.10) 
i=l i=l 


Equation (3.10) looks very much like an ARMA(q, p) process in the {€7} se- 
quence. If there is conditional heteroskedasticity, the correlogram should be sug- 


ct the correlogram of the squared 


STEP 1: Estimate the {y,} sequence using the “best-fitting” ARMA model (or re- 
© gression model) and obtain the squares of the fitted errors €?. Also calcu- 
late the sample variance of the residuals G?) defined as 


2 
= Ser 
t=] 


where T= number of feSiduals 


mponents in the heteroskedastic variance. If 


d estimate. This is particularly true since - 


148 Modeling Economic Time Series: Trends and Volatility 


STEP 2. Calculate and plot the sample au:ocorrelations of the squared residuals as 


T 


SGP es?) 
tzi+l 

T 

YE -e 

i=l 


pC) 


STEP 3: In large samples, the standard deviation of p(i) can be approximated by 


T~'?, Individual values of p(i) with a value that is significantly different - 


from zero are indicative of GARCH errors. Ljung-Box Q-statistics can be 


used to test for groups of significant coefficients. As in Chapter 2, the sta- 
tistic 


Q=1T+2)¥ p(T -i) 


i=l 


has an asymptotic x? distribution with n degrees of ireedori if the é? are 
uncorrelated. Rejecting the null hypothesis that the ê ? are Gneoeclated is 
equivalent to rejecting the null hypothesis of no ARCH or GARCH errors. 
In practice, you should consider values of n up to T/4, 
The more formal Lagrange multiplier test for ARCH disturbances has 
` been proposed by Engle (1982). The methodology involves the Yöllowing 
two steps:" 


STEP 1: Usc OLS to estimate the most appropriate AR() (or regression) model: 


Yi = Qo + AY + Mg FF Wien Y Er 
STEP 2: Obtain the squares of the fitted errors ê i- Regress these squared residuals 
22 
on a constant and on the q lagged valucsê ê Chas Eny ee SES that is, es- 
timate 


a? a a2 
ê? = Oy + AE? +2 H + AE 3.11) 


If there are no ARCH or GARCHI effects, the estimated valucs of œ 
through a, should be zero. Hence, this regression will have little explana: 
tory power so that the coefficient of determination Ge., the usual R?-sta- 
listic) will be quite low. With a sample of T residuals, under the null ny: 
pothesis of no ARCH errors, the test statistic TR? converges to a x 


distribution. If TR? is sufficiently large, rejection of the null hypothesis 


ARCH and GARCH Estimates of Inflation 149 


that Q, through a, arc jointly equal to zero is equivalent to rejecting the 
null hypothesis of no ARCH errors. On the other hano, iLOR? is sufti- 
ciently low, it is possible to conclude that there are no ARCH effects. 


3. ARCH AND GARCH ESTIMATES OF INFLATION 


ARCH and GARCH models have become very popular in that they cnable the 
econometrician to estimate the variance of a series at a particular point in time. To 
illustrate the distinction between the conditional variance and the unconditional 
variance, consider the nature of the wage bargaining process. Clearly, firms and 
unions need to forecast the inflation rate over the duration of the labor contract. 
Economic theory suggests that the terms of the wage contract will depend on the in- 


‘flation forecasts and uncertainty concerning the accuracy of these forecasts. Let 


ET, denote the conditional expected rate of inflation for 1 + | and 62, the condi- 
tional variance. If parties to the contract have rational expectations, the terms of the 
contract will depend on E,x,,, and o2, as opposed to the unconditional mean or un- 
conditional variance. Similarly, as mentioned above, asset pricing models indicate 
that the risk premium will depend on the expected return and variance of that re- 
turn. The relevant risk measure is the risk over the holding period, not the uncondi- 
tional risk. . 

The example illustrates a very important point. The rational expectations hypoth- 
esis asserts that agents do not waste useful information. In forecasting any time se- 
ries, rational agents use the conditional distribution, rather than the unconditional 
distribution, of that series. Hence, any test of the wage bargaining model above that 
uses the historical variance of the inflation rate would be inconsistent with the no- 
tion that rational agents make use of all available information (i.e., conditional 


means and variances). A student of the “economics of uncertainty” can immedi- 


ately see the importance of ARCH and GARCH models. Theoretical models using 


variance as a measure of risk (such as mean variance analysis) can be tested using > 


the conditional variance. As such, the growth in the use of ARCH/GARCH meth- 
ods has been nothing short ofi impressive. 


Engle’s Model of U.K. Inflation 


Although Section 2 focused on the residuals of a pure ARMA model, it is possible 
to estimate the residuals of a standard multiple-regression model as ARCH or 
GARCH processes. In fact, Engle’s (1982) seminal paper considered the residuals 


of the simple model of the wage/price spiral for the U.K over the 1958:IE to 1977:11 ° 
period. Let p, denote the log of the U.K. consumer price index and w, the log of the. 


index of nominal’ wage rates. Thus, the rate of inflation is x, = =p, ~ Pr and the real 


wage r, = w, — pr Engle reports that after some experimentation, he chose the fol- ` 


lowing model of the U.K. inflation rate (standard errors appear in parentheses): 


Renee ene 


fern ty 


aden sth ai 3 aY mu MO mie balta insati n 
Sena a Re aa SO AE a a T S 


150 Modeling Economic Time Series: Trends and Volatility 


ARCH and GARCH Estimates of Inflation {Sf 


T, = 0.0257 + 0.334n,_, + 0.4087, 4 — 0.4047, s +0.0559r,_, + €, 
(0.0057) (0.103) (0.110) (0.114) (0.0136) 
h, = 0.000089 


is a convergent process. Using the calcul 
finds that the standard deviation of inf] 
economy moved from the “predict 
estimate of 0.955 indicates 


ated values of the {/,} sequence, Engle 


ation forecasts more than doubled as the 
able sixties into the chaotic seventies.’ 
an extreme amount of persistence. 


(6.12 


The point 
where h,= the variance of {e,} 


The nature of the model is such that increases in the previous period’s real wage 
increase the current inflation rate. Lagged inflation rates at t — 4 and 1 — 5 are in- 
tended to capture seasonal factors. All coefficients have a t-statistic greater than 
3.0, and a battery of diagnostic tests did not indicate the presence of serial correla- 
tion. The estimated variance was the constant value 8.9E-S. In testing for ARCH 
errors, the Lagrange multiplier test for ARCH(1) errors was not significant, but the 
test for an ARCH(4) error process yielded a value of TR? equal to 15.2. At the 0.01 
Significance level, the critical value of x’ with four degrees of freedom is 13.28; 
hence, Engle concludes that there are ARCH errors. 


Engle specified a ARCH(4) process forcing the following declining set of 


Bollerslev’s Estimates of U.S. Inflation 


Bollerslev’s (1986) estimate of U.S. inf 
a standard autoregressive time-serics m 
3. model with ARCH errors, and model wi 
procedure has been useful in modeli 
out (sce pp. 307-308) that 


ation provides an interesting comparison of 


odel (which assumes a constant variance), 


th GARCH errors. He notes that the ARCH 


ng different economic phenomena but points 


Common to most... applications, however, is the introduction of a 
rather arbitrary linear declining } 


ag structure in the conditional variance 
equation to take account of the l 


m An a a a AR T T EN 


ong memory typically found in empiri- } 

: cal work, since estimating a totally free lag distribution often will lead i 

Weights on the errors: to violation of the non-negativity constraints. $ 

f = Oy + a, (0.4e2, + 0.3e? 240.2625 + O.le?, (3:13) There is no doubt that the lag structure Engle used to model ft, in (3.14) is subject 4 

:, to this criticism. Using quarterly data over the 1948.1 to 1983.1V period, Bollerslev E 

The rationale for choosing a two-parameter variance function was to ensure the sS (1986) calculates the inflation rat e 
nonnegativity and stationarity constraints that might not be satisfied using an ` 


€ (T,) as the logarithmic change in the U.S. GNP 


deflator. He then estimates the autoregression: 


unrestricted estimating equation. Given this particular set of weights, the necessary 
and sufficient conditions for the two constraints to be satisfied are o > 0 and T, = 0.240 + 0.552m,., +0. (771, + 0.232n,, — 0.209n,, +6, 
O<’ 


(0.080) (0.083) (0.089) (0.090) (0.080) 


Engle shows that the estimation of the paramcters of (3.12) and (3.13) can be h, = 0.282 


considered separately without loss of asymptotic efficiency. One procedure is to es- 
timate (3.12) using OLS and save the residuals. From these residuals, an estimate of 
the parameters of (3.13) can be constructed, and based on these estimates, new esti 
mates of (3.12) can be obtained. To estimate both with full efficiency, continued it 
erations can be checked to determine whether the separate estimates are converg 
ing. Now that many statistical software packages contain nonlinear maximum 
likelihood estimation routines, the current procedure is to simultaneously estimat 
both equations using the methodology discussed in Section 7 below. 
Engle’s maximum likelihood estimates of the model are 


(3.15) 


Equation (3.15) seems to h 


estimated time-series 
model. All coeffi 


al levels (the standard errors 
ated values of the autoregressive coefficients 
mply stationarity. Bollerslev reports that the ACF and 


significance level, However, as is typical of 


> €) show signifi- 
ultiplicr tests for ARCH(I), ARCH(4), and 


Bollerslev next estimates the restricted ARCH(8) model origin 


n, = 0.0328 + 0.162n,_, + 0.264n,_, — 0.3257, 5 + 0.0707r,_, + €, ay of comparison to (3.15), he finds 


(0.0049) (0.108) (0.089) (0.099) (0.0115) 
h, = 1.4E-5 + 0.955(0.4e2 , + 0.3e%, + 0.2e2, + 0.1e2) 


1-3 


(8.5) (0.298) (3.14 


ally proposed by 


rikan 


T, =0.138 + 0.4237, ı + 0.2227, + 0.3777, , — 0.1757, +€, 
(0.059) (0.081) (0.108) 0.079 ¢0. 104) 


R 
h, =0.058+0.802 Ñ (9—i)/36e2, 
f=} 


(0.033) (0.265) 


ieee sy 
Sa 


Sih 


The estimated values of h, are one-step ahead forecast error variances. All coeffi- 
cients (except the own lag of the inflation ratc) are significant at conventional lev 
els. For a given real wage, the point estimates of (3.14) imply that the inflation rate ` 


(3.16) 


x 


182 Modeling Economic Time Series: Trends and Volatility 


Note that the autoregressive cocfficients of (3.15) and (3.16) are similar. The 
models ol the variance, however, are quite different. Equation (3.15) assumes a 
constant variance, whereas (3.16) assumes the variance (A) is a geometrically de- 
clining weighted average of the variance in the previous cight quarters. 

Hence, the inflation rate predictions of the twe models should be similar, but the 
confidence intervals surrounding the forecasts will differ. Equation (3.15) yields a 
constant interval of unchanging width. Equation (3.16) yields a confidence interval 
that expands during periods of inflation volatility and contracts in relatively tran- 
quil periods. 


In order to test for the presence of a first-order GARCH term in the conditional 
variance, it is possible to estimate the equation: 


R 
h, = Oy +a) X (9=1)/3G¢7 5 + Bilim GAD | 


t=] 


The finding that B, = 0 would imply an absence of a first-order moving average 


term in the conditional variance. Given the difficulties of estimating (3.17), 


Bollerslev (1986) uses the simpler Lagrange multiplier test. Formally, the test in- 
volves constructing the residuals of the conditional variance of (3.16). The next 
step is to regress these residuals on a constant and h,_,; the expression TR? has a? 
distribution with one degree of freedom. Bollerslev finds that TR? = 4.57; at the 5% 


significance level, he cannot reject the presence of a first-order GARCH process. 
He then estimates the following GARCH(], 1) model: . 
mt, = 0.141 + 0.4332,_, + 0.2297, + 0.3497, — 0.1627,_, + €, 
(0.060) (0.081) (0.110) (0.077) (0.104) 
h, = 0.007 + 0.1358, + 0.829/7, 


(0.006) (0.070) (0.068) (3.18) 
Diagnostic checks indicate that the ACF and PACF of the squared residuals do 

not reveal any cocfficients exceeding 277'7. LM tests for the presence of addi- 

tional lags of e7 and for the presence of A,_, are not significant at the 5% level. 


4. ESTIMATING A GARCH MODEL OF THE WPI: 
AN EXAMPLE 


To obtain a better idea of the actual process of fitting a GARCH model, reconsider 
the U.S. Wholesale Price Index data uscd in the last chapter. Recall that the 


Box—Jenkins approach led us to estimate a model of the U.S. rate of inflation (7) 
having the form: 


T, = do tay, + E + Brey + Pass 


Estimating a GARCH Model of the WPI: An Example 153 


When we uscd the standard criteria of the Box—Jenkins procedure, the estimated 
model performed quite well. All estimated parameters were significant at conven- 
tional Jevels and both the AIC and SBC sclected the ARMAC, (1, 4)) specilica- 
tion. Diagnostic checks of the residuals did not indicate the presence of serial corre- 
lation and there was no evidence of structural change in the estimated coefficients. 
During the 1970s, however, there was a period of unusual volatility that is charac- 
teristic of a GARCH process. The aim of this section is to illustrate a step-by-step 
analysis of a GARCH estimation of the rate of inflation as measured by the WPI. The 
data series is contained in the file labeled WPI.WK1 on the data disk. Question 7 
at the end of this chapter guides you though the estimation procedure reported below. 

In the last chapter, some of the observations were not used: in the estimation 
Stage, so that out-of-sample forecasts could be performed. Estimating the same 
model over the entire 1960:I to 1992:2 sample period yields 


T, = 0.0101 + 0.78757, + €, — 0.4374e,., + 0.2957e, 


(0.0039) (0.0865) (0.1126) (0.0904) 
h, = 1.9193E-4 


(3.19) 
The ACF and PACF of the residuals do not indicate any sign of serial correla- 
tion. The only suspect autocorrelation coefficient is for lag 6; the value p(6) = 
0.1619 is about 1.8 standard deviations from zero. All other autocorrelations and 
partial autocorrelations are less than 0.11. The Ljung-Box Q-statistics for lags of 
12, 24, and 36 quarters are 8.47, 15.09, and 28.54; none of these values are signifi- 
cant at conventional levels. 
Although the model appears adequate, the volatility during the 1970s suggests an 
examination of the ACF and PACF of the squared residuals. The autocorrelations 
of the squared residuals are such that p(1) = 0.126, p(2) = 0.307, p(3) = 0.115, and 
p(4) = 0,292. Other values for p(i) are generally 0.10 or less. The Ljung-Box 
Q-statistics for the squared are all highly significant; for example, Q(4) = 27.78 and 
Q(12) = 37.55, which are both significant at the 0.00001 level. At this point, onc 
might be tempted to plot the ACF and PACF of the squared residuals and estimate 


. the squared residuals using Box—Jenkins methods. The problem with this strategy is ` 


that the errors were not generated by the maximum likelihood technique and are not 
fully efficient. Hence, it is necessary to formally test for ARCH errors. 


Alternative Estimates of the Model 


Next, let ê, denote the residuals of (3.19) and consider the ARCH(q) model for lag 
lengths of 1, 4, and 8 quarters: l 


GETTE DAT 9 -3.20 


If we estimate (3.20) using OLS, the calculated values of TR? for g = 1, 4, and 8 
are 22.9}, 35.70, and 37.60, respectively. Hence, there appear to be ARCH errors at 


et eT 
ATS AT GS 


Renee 


SR 


Basten 


is 


a 
OA drean aai 


oe 


154 Modeling Economic Time Series: Trends and Volatility 


the 1% significance level; the critical values of x? with one, four, and eight degrees 
of freedom are 5.41, 11.67, and 18.17, respectively. Since the values of TR? for q= 
4 and q = 8 are similar, it seems worthwhile to pin down the lag length to an 
ARCH(4) process. A straightforward method is to estimate (3.20) for q = 8. In this 
instance, the F-test for the null hypothesis a, = Oe = Oz = Qg = 0 cannot be rejected 
at conventional levels. 

Another way to determine whether four versus eight lags are most appropriate is 
to use a Lagrange multiplier test. To use this test, estimate (3.20) with q = 4; let 
{ea} denote the residuals from this regression. To determine whether lags 5 


through 8 contain significant explanatory power, use the {€,4,} sequence to estimate 
the regression: 


8 
pie 2 
Ey = Ay +) oa: 
i=l 


If lags 5 through 8 contain little explanatory power, TR? should be small. 
Regressing €4, on a constant and cight lags of €,, yielded a value of TR? = 3.85. 
With four degrees of freedom, 3.85 is far below the critical value of %7; it seems 
plausible to conclude that the errors are characterized by an ARCH(4) process. The 
same procedure can be used to test whether the model is an ARCH(1) or ARCH(4) 
process. Now Iet {¢€,,} denote the residuals of (3.20) estimated with q = 1, 
Regressing €,, on a constant and four lags of «,, yielded a value of TR? = 16.32. At 
the 0.001 significance level, the critical value of x? with three degrees of freedom is 
16.27. Hence, it hardly seems plausible to conclude that an ARCH(1) characterizes 
the error process; lags 2 through 4 cannot be excluded from (3.20). 

Overall, these tests suggest estimating the inflation rate using an ARMAL[I,. (1, 


4)} model by assuming an ARCH(4) error process. The results from a maximum 
likelihood estimation are 


m, = 0.0021 + 0.57237, + €,~ 0.1189e,_, + 0.3108e, 4 


(0.0012) (0.1298) (0.1135) (0.0645) 
h, = 2.1247E-5 + 0.1433e2,, + 0.2270€2., + 0.0037e2, + 0.6755€2 4 
(0.0000) (0.1384) (0.1725) (0.0709) (0.2031) 8.2) 


Although cach estimated coefficient has the correct sign, we should be somewhat 
concerned about the number of insignificant coefficients. Note that the estimated 
coefficient on €,_, in the equation for z, is about one standard error from zero; we 
know, however, that eliminating this coefficient from the ARMA{I, (1, 4)] model 
of inflation leads to residuals that are serially correlated. Moreover, three values of 
the ARCH(4) error process are not insignificantly different from zero at conven- 
tional significance levels. The likely solution to the problem concerns the modeling 
of the ARCH process; perhaps a more parsimonious model is in order. 


Estimating a GARCH Model of the WPL: An Example 155 


One approach to reducing the number of cstimated parameters is to constrain the 
conditional variance to have the same declining weights given by (3.13). The maxi- 
mum likelihood estimates of this constrained ARCH(4) process are 


rt, = 0.0011 + 0.92017, + €, —0.4304e,_, + 0.11986, 
(0.0011) (0.0795) (0.1476) (0.0929) 
h, = 6.3767E-5 + 0.5850(0.4e2,, + 0.32, + 0.2e2; + 0.1624) 


(1.08E-3) (0.0795) (3.22) 


Here, the estimated parameters of the ARCH process are both positive and sig- 
nificantly different from zero. The estimated value of œ, (=0.5850) implies that fr, is 
convergent. The problem with this model is that the estimated value of a, (=0.9201) 
is dangerously close to unity (implying a divergent process) and By is significant at 
only the 0.19 level. Before contemplating the use of second differences or setting 
B, = 0 and eliminating €,_, from the model, we should be concerned about the va- 
lidity of the restricted error process. One way to proceed is to try alternative 
weighting patterns and select the “best” pattern. Of course, this approach is subject 
to Bollerslev’s criticism of being completely ad hoc. 


A better alternative is to use à GARCH(I, 1) model. As a first step, the error 
process was estimated as 


T, = 0.0013 + 0.79687,- + €, — 0.4014e,_, + 0.2356¢,.4 
(0.0012) (0.1141) (0.1585) (0.1202) “> 
h, = 1.56726-5 + 0.2226¢2 , + 0.6633h,_, 


(9.34E-6) (0.1067) (0.1515) (3,23) 


Notice that the point estimates of the parameters imply stationarity and all coeffi- 
cients but the intercept term in the x, equation are significant at the 10% level. The 
value of the maximized likelihood function is greater for the GARCH(1, 1) model 
than pure ARCH processes even though all models were estimated over the same 
time period.® The maximized values of the likelihood function for (3.21), (3.22) 
and (3.23) are 483.25, 491.83, and 496.98, respectively. Moreover, the GARCH(1, 
1) model necessitates the estimation of only two parameters. Thus, the GARCH(1, 
1) process yields the best fit. 

Diagnostic tests did not indicate the need to include other lags in the GARCH(1, 
1) model. The Lagrange multiplier tests for the presence of additional values of a, 
or B; were insignificant at conventional levels. Since /, is an estimate of the condi- 
tional variance of T, (h,,,)'” is the standard error of the one-step ahead forecast cr- 
ror of m,,,. Figure 3.10 shows a band of 2 (#,,,)'* surrounding the one-step ahead 
forecast of the WPI.’ In contrast to the assumption of a constant conditional vari- 


ance, note that the band width increases in the mid-1970s and latter part of the 
1980s. . 


we WN tee ee t 


to 


156 Modeling Economic Time Series: Trends and Volatility 
Figure 3.10 0 Two-standard-deviation forecast interval for the WPI. : 
740 i 


120 


100 


80i- 


1985 = 100 


60 |- 


40 i= 


i ! 
20 564 1968 1972 1976 1980 1984 1988 1992 


Note: The band spans two standard deviations on either 
side of the one-step ahead forecast of the WPI. 


5. A GARCH MODEL OF RISK 5 


An interesting application of GARCH modeling is provided by Holt and Aradhyula . 


(1990). Their theoretical framework stands in contrast to the cobweb model in that 


rational expectations are assumed to prevail in the agricultural sector. The aim of _ | 


the study is to examine the extent to which producers in the U.S. broiler (i.e. 
chicken) industry exhibit risk-averse behavior. To this end, the supply function for 
the U.S. broiler industry takes the form:® 


G, = Uy + ps — ah, ~ aaspfeed,_, + aghatcly., + Asq,4 + En (3.24) 


where q, = quantity of broiler production (in millions of pounds) in t 


ps = expected real price of broilers at ¢ conditioned on the informa- 
tion at ? —1 (so that pf = E,po 

h, = expected variance of the price of broilers in ¢ conditioned on 
the information at!— 1 

pfeed, = real price of broiler feed (in cents per pound) at t — 1 

hatch, = hatch of broiler-type chicks in commercial hatcheries (mea- 
sured in thousands) in t — | 

Er = supply shock in! 


A GARCH Model of Rixk 157 


and the length of the time period is one quarter. 
The supply function is based on the biological fact that the production cycle of 
broilers is about 2 months. Since bimonthly data are unavailable, the model as- 
sumes that the supply decision is positively related to the price expectation formed 
by producers in the previous quarter. Given that feed accounts for the bulk of pro- 
duction costs, real feed prices lagged one quarter are negatively related to broiler 
production in z4. Obviously, the hatch available in z — } increases the number of 
broilers that can be marketed in t. The fourth lag of broiler production is included to 
-aecount for the possibility that production in any period may not fully adjust to the 
desired level of production: 
‘For our purposes, the most interesting part of the study is the negative effect of 
the conditional variance of price on broiler supply. The timing of the production 
"process is such that feed and other production costs must be incurred before output 

is sold in the market. In the planning stage, producers must forecast the price that 
will prevail 2 months hence. The greater pf, the greater the number of chicks that 

will be fed and brought to market. If price variability is very low, these forecasts 
.. tan be held with confidence. Increased price variability decreases the accuracy of 
-. the forecasts and decreases broiler supply. Risk-averse producers will opt to raise 
_ and market fewer broilers when the conditional volatility of price is high. 

In the initial stage of the study, broiler prices are estimated as the AR(4) process: 


(1-B,L- BL? - BL- Balp, = Bo + Ey (3.25) 


appear to be white-noise at the 5% level. However, the Q-statistic for the squared 
residuals, that is, the (€3,}, of 32.4 is significant at the 5% level. Thus, Holt and 
Aradhyula conclude that the variance of the price is heteroskedastic. 

In the second stage of the study, several low-order GARCH estimates of (3.25) 
are compared. Goodness-of-fit statistics and significance tests suggest a GARCH(1 
. l) process. In the third stage, the supply equation (3.24) and a GARCH(I,1) 


"process are simultaneously estimated. The estimated price equation (with standard 
- errors in parentheses) is 


d = 0.5L- 0.1292? -= 0,130L? ~ 0.138L")p, = 1.632 + €,, 
(0.092) (0.098) (0.094) (0.073) (1.347) 
h,= 1.353 + 0.16263, + 0.591f,_, (3.27) 
(0.747) (0.080) (0.175) 


(3.26) 


, Equations (3.26)-and (3.27) are well behaved in that (1) all estimated coefficients 
Re’ are significant at conventional significance levels; (2) all coefficients of the condi- 


Ea tional variance equation are positive; and (3) the coefficients all imply convergent 
ig processes, 


Holt and Aradhyula assume that producers use (3. 26) and (3. 27) to form their 
i price expectations. Combining these estimates with (3.24) yields the supply equation 


Ljung-Box Q-statistics for various lag lengths indicate that the residual series - 


A te HEE SERRE PES 


E an 


ae 


enmena 


158 Modeling Economic Time Series: Trends and Volatility 


1, = 2.707 pF fee 4.325p feed, 4 1.887hatch i + 0.603q,4 + €y, (3.28) 


(0.585) L344) (1.463) (0.205) (0.065) ~ 


All estimated coefficients are significant at conventional levels and have the ap- 


propriate sign. An increase in the expected price increases broiler output. Increased 
uncertainty, as measured by conditional variance, acts to decrease output. This for- 


ward-looking rational expectations formulation is at odds with the more traditional ' 


cobweb model discussed in Chapter 1. In order to compare the two formulations, 
Holt and Aradhyula (1990) also consider an adaptive expectations formulation (see 
Exercise 2 in Chapter 1). Under adaptive expectations, price expectations are 


formed according to a weighted average of the previous period’s price and the pre- 
vious period’s price expectation: 


P? = Op, + (l - Pi- 


or solving for p; in terms of the {p,} sequence, we obtain 


Pi =a (=p 


i=() 


Similarly, the adaptive expectations formulation for congitional risk is given by 


= BY 0 -BY Oai ~ Piai? "3.29 


i=() 


where O<B< 1 and (p, piaia = the forecast error variance for period t~i. 


Note that in (3.29), the expected measure of risk as viewed by producers is not 
necessarily the actual conditional variance. The estimates of the two models differ 
concerning the implied long-run elasticitics of supply with respect to expected price 
and conditional variance.’ Respectively, the estimated long-run elasticities of sup- 
ply with respect to expected price are 0.587 and 0.399 in the rational expectations 
and adaptive expectations founulations. Similarly, rational and adaptive expecta- 
tions formulations yield long-run supply elasticities of conditional variance of 
-0.030 and —0.013, respectively. Not surprisingly, the adaptive expectations model 


‘suggests a more sluggish supply response than the forward-looking rational expec- 


tations model. 


6. THE ARCH-M MODEL 


Engle, Lilien, and Robins (1987) extend the basic ARCH framework to allow the . 


mean of a sequence to depend on its own conditional variance. This class of model, 
called .\RCH-M, is particularly suited to the study of asset markets. The basic in- 


The ARCH-M Model 159 


sight is that risk-averse agents will require compensation for holding a risky assct. 
Given that an asset’s riskiness can be measured by the variance of returns, the risk 
premium will be an increasing function of the conditional variance ol reluras. 
Engle, Lilien, and Robins express this idea by writing the excess return from hold- 
ing a risky asset as 


Y= Uy + €, 8.90) 


where y, = excess return from holding a long-term asset relative to a onc-period 
treasury bill l 
u, = risk premium necessary to induce the risk-averse agent to hold the 
long-term asset rather than the one-period bond . 
' €, = unforecastable shock to the excess return on the long-term asset 


To explain (3.30), note that the expected excess return from holding the long- 
term asset must be just equal to the risk premium:!! 


Emy = H 


Engle, Lilien, and Robins assume that the risk premium is an increasing function 
of the conditional variance of €,; in other words, the greater the conditional variance 
of returns, the greater the compensation necessary to induce the agent to hold the 
long-term asset. Mathematically, if A, is the conditional variance of €, the risk pre- 
mium can be expressed as , 


= B+ oh, 6>0 (3.31)- 


where hh, is the ARCH(gq) process: 


h, = Qq + Dee f 0 832) 


As a set, Equations (3.30), (3.31), and se constitute the basic ARCH-M 
model, From (3.30) and (3.31), the conditional mean of y, depends on the condi- 
tional variance /i,. From (3.32), the conditional variance is an ARCH(q) process. 
It should be pointed out that if the conditional variance is constant (i.e., if @, =O) = 
“= Q, = 0), the ARCH-M model degenerates into the more traditional case of a 
constant risk premium. i 

Figure 3.1 illustrates two different ARCH-M processes. The upper-left-hand 
graph (a) of the figure shows 60 realizations of a simulated white-noise process de- 
noted by {e,}. Note the temporary increase in volatility during periods 20 to 30. By 
initializing €y = 0, the conditional variance was constructed as the first-order ARCH 
process: E 


h,=1 +0.65€2, 


160 Modeling Economic Time Series: Trends and Volatility 


As you can sec in the upper-right-hand graph (b), the volatility in {e,} translates 


itself into increases in the conditional variance. Note that large positive and nega- a ¢ 


tive realizations of €,_, result in a large value of j; it is the square of each {¢,} real- 
ization that enters the conditional variance. In the lower left graph (c), the values of 
B and 6 are set equal to —4 and +4, respectively. As such, the y, sequence is con- 
structed as y, = —4 + 4h, + e, You can clearly sce that y, is above its long-run value 
during the period of. volatility. In the simulation, conditional volatility translates it- 
self into increases in the valucs of {y,}. In the latter portion of the sample, the 
volatility of {€,} diminishes and the values y, through Yeo fluctuate around their 
long-run mean. 

The lower-right-hand graph (d) reduces the influence of ARCH-M effects by re- 
ducing the magnitude of ò and B (see Exercise 5 at the end of this chapter), 
Obviously, if 5 = 0, there are no ARCH-M effects at all. As you can sce by compar- 
ing the two lower graphs, y, more closely mimics the €, sequence when the magni- 
tude of 5 is diminished from 6 = 4 to ô = 1." 

As in ARCH or GARCH models, the form of an ARCH-M model can be deter- 
mined using Lagrange multiplier tests exactly as in (3.11). The LM tests are rela- 


Figure 3.11 Simulated ARCH-M processes. 


White noise process. 


h, = Og + ale, 1) 


2 —~T 7 
OF 
-2 | i 
0 20 40 60 


ta} (b} 


y, 2-44 4h, +e, yparl eh, +e, - 


to) “ (d) 


The ARCH-M Model 161 


tively simple to conduct since they do not require estimation of the full model. The 


statistic TR? is asymptotically distributed as ¢” with degrees of freedom equal to the 
number of restrictions. 


implementation 


Using quarterly data from 1960:1 to 1984:1f, Engle, Lilien, and Robins (1987) con- 
structed the excess yield on 6-month treasury bills as follows. Let r, denote the 
quarterly yield on a 3-month treasury bill held from ¢ to (f + 1). Rolling over all 
proceeds, at the end of two quarters an individual investing $1 at the beginning of 
period ¢ will have (1 + 7)(1 + rı) dollars. In the same fashion, if R, denotes the 
quarterly yield on a 6-month treasury bill, buying and holding the 6-month bill for 


the full two quarters will result in (1 + R) dollars. The excess yield duc to holding 
the 6-month bill is approximately 


y, = 2R, -rm — (3.33) 


The results from regressing the excess yield on a constant are (the t-statistic is in 
parentheses) 


y, = 0.142 +6, —. , © (3.34) 
(4.04) 


The excess yield of 0.142% per quarter that is over four standard deviations from 
zero. The problem with this estimation method is that the post-1979 period showed 
markedly higher volatility than the earlier sample period. To test for the presence of 
ARCH errors, the squared residuals were regressed on a weighted average of past 
squared residuals as in (3.13). The LM test for the restriction œ, = 0 yields a value 
of TR? = 10.1, which has a x? distribution with one degree of freedom. At the 1% 
significance level, the critical value of y? with one degree of freedom is 6.635; 
hence, there is strong evidence of heteroskedasticity. Thus, there appear to be 


ARCH errors so that (3.34) is misspecified if individuals demand a risk premium. 


The maximum likelihood estimates of the ARCH-M model and associated t-sta-. 
- tistics are 


= -0.0241 + 0.687h, + €, 
(-1.29) (5.15) 

hy, = 0.0023 + 1.64(0.4e7., + 0.3¢7 2 + 0.2¢7.5 + 0.174) | 
(1.08) (6.30) 


The estimated coefficients imply a time-varying risk premium. The estimated pa- 
rameter of the ARCH equation of 1.64 implies that the unconditional variance is in- 


Shocks to €,., act to increase the conditional variance so that there are periods of 
tranquility and volatility. During volatile periods, the risk premium rises as risk- 
averse agents seek assets that are conditionally less risky. 


finite. Although this is somewhat troublesome, the conditional variance is finite. 


ae 
f 
ayy 


Fay 


fs 


CA bakit 
aaa 


MIEIS 
RNS 


ay 


162 Modeling Economic Time Series: Trends and Volatility Maximum Likelihood Estimation of GARCH and ARCH-M Models 163 

The next section considers some of the mechanics involved in estimating an 
ARCH-M model. Exercise 8 at the end of this chapter asks you to estimate such a 
ARCH-M model using simulated data. The questions are designed to guide you 
through a typical estimation procedure. 


and 


ô? = Şo, any 


Thus, with sample data, the maximum likelihood estimate of the mean is Ñ and 
the maximum likelihood estimate of the variance is G?. The same principle applies 
in a regression analysis. Suppose that {€,} is generated by the following model: 


7. MAXIMUM LIKELIHOOD ESTIMATION OF GARCH AND 
ARCH-M MODELS 


Many software packages contain built-in routines that estimate GARCH and 
ARCH-M models such that the researcher simply specifies the order of the process 
and the computer does the rest. Even if you have access to an automated routine, it 
is important to understand the numerical procedures used by your software pack- 
age. Other packages require user input in the form of a small optimization algo- 
rithm. This section explains the maximum likclihood methods required to under- 
stand and write a program for ARCH-type models. 

Suppose that values of {y,} are drawn from a normal distribution having a mean 
H and constant variance 0°. From standard distribution theory, the log likelihood 
function using T independent observations is 


=, Bx, 


-In the classical regression model, the mean of €, is assumed to be zero, the vari- 
ance is the constant 07, and the various realizations of {€,} are independent. If we 


use a sample with T observations, the log likelihood equation is a simple modifica- 
tion of the above: 


; 
log £ =-(T/2) In(27)-(T/2) mo? -01/20 $ 0, —Bx,)? 
t=] 


3 3 T z Maximizing the likelihood equation with respect to 0? and B yields 
log £ =-(T/2) n(2x)-(T/2) Ino? - (1/207) Y O, -W) 


tal 


[(0 log ¥)/30?}=-(T/20? 4020S Uy, ~ Px, )? 


t=} 


where log £= log of the likelihood function a 


The procedure in maximum likelihood estimation is to select the distributional 
parameters so as to maximize the likelihood of drawing the observed sample. In the 


example at hand, the problem is to maximize log £ with respect to p and o°. The 
first-order conditions are 


. T i 
KO log #)/9B)= (1/0?) (y,x, - Bx?) 
T aa} 
[( log £)/dp] = W/o") S O, =m) 
i P that yield the maximum value of log & result in the familiar OLS estimates: of 


and tevi variance and B (denoted by ô? and B). Hence, 


T 6 = LE JAT 
[log £)/36"] =-(T/20") + (20°) > (y, -W 


t=} 


Setting these partial derivatives equal to zero and solving for the values of p and 


B= Ery, E 
o? that yield the maximum valuc of log & (denoted by {1 andô’), we get 


ji = Ly,/T 


Setting these partial derivatives equal to zero and solving for the values of B and 


T ete Ce ae 


ETEESI 


y 
H 


164 Modeling Economic Time Series: Trends and Volatility 


size here is that the first-order conditions are casily solved since they are all linear. 
Calculating the appropriate sums may be tedious, but the methodology ts straight- 
forward. Unfortunately, this is not the case in estimating an ARCH-type model 
since the first-order equations are nonlinear. Instead, the solution requires some sort 
of search algorithm. The simplest way to illustrate the issue is to introduce an 
ARCH(1) error process into the regression model. Continue to assume that €, is 
generated by the linear equation €, = y, — Bx, Now let e, be given by (3.2): 


€,= (Oy + ,€2,)°° 
so that the conditional variance of €, is 
h, +: Og + O€24 
Although the conditional variance of e, is not constant, the necessary modifica- 


tions are clear. Since each realization of e, has the conditional variance h, the ap- 
propriate log likelihood function ts i 


F T 
log & = (7/2) in(2m)— (1/2) Stn h, — 1/2) fy Ba)? 
t=l tl 


where h,= +02 
l = Oo + U Orm = Bx)? arse 


Finally, it is possible to combine the above and then to maximize log & with re- - 
spect to Op, &,, and B. Fortunately, computers are able to select the parameter val- : 
ues that maximize this log likelihood function. In most time-series software pack- ` 
ages, the procedure necessary to write such programs is quite simple. For example, - 


RATS uses a typical set of statements to estimate this ARCH(1) model. 
Consider:° J 


NONLIN B o% 0, 

FRML e = y - Bx 

FRML h = 0p + O,*€7, - 

ERML LIKELIHOOD = —0).5* log(h,) + (2/h))] 

COMPUTE $ = initial guess, Qg = initial guess, &, = initial guess 
MAXIMIZE(RECURSIVE:) LIKELIHOOD 2 end 


The first statement preparcs the program to estimate a nonlinear model. The sec- 
ond statement sets up the formula (FRML) for e; €, is defined to be y, — Bx, The 
third statement sets up the formula for h, as an ARCH(1) process. The fourth state- 
ment is the key to understanding the program. The formula LIKELIHOOD defines 
the log likelihood for observation f; the program “understands” that it will 


have the form 


Maximum Likelihood Estimation of GARCH and ARCH-M Modets 165 


maximize this sum over all T — 1 observations. Note that the constant term 
~(T/2)log(2n) is excluded from the definition of LIKELIHOOD; a constant has no 
effect on the solution to an optimization problem. The program fequires initial 
guesses for B, Oy, and œ. In practice, a reasonable initial guess for B could cone 
from an OLS regression of {y,} on {x,}. The initial guess for a, could be the vari- 
ance of the residuals estimated from this OLS regression. After all 
ARCH effect, OLS and the maximum likelihood methods are identical. The initial 
guess for a, could be a small positive number. The final statement tell 
gram to maximize LIKELIHOOD from observ 
is lost) to the end of the sample." 

It is possible to estimate more sophisticated models using a comparable proce- 
dure. The key to writing a successful program is to correctly specify ee 
process and variance. To estimate the ARMA|I, (1, 4)]-ARCH(4) model of the in- 


aaa rate given by (3.22), lines 3 and 4 of the program would be replaced 
with: l 


, if there is no 


s the pro- 
ation 2 (since the initial observation 


FRML € =m,- ap- afm) ~ Dye) — Daea 
FRML h = Oy + 0,(0.4e2,, +0.32, + 0.2e2 + 0.162) 


Here, the first formula statement defines €, as the residual from an ARMAI(I, (1 
4)] process. The second statement constrains the lagged coefficients to exhibit a 


smooth decay. Similarly, the GARCH(1, 1) version of this same model—see 
(3.23)—uses the program steps: 


FRML € = T, ~ ay — aF — D€, — Dy g 
FRML h = Q, + oye? + Bara 


The program steps for the ARCH-M model of Engle, Lilien, and Robbins (1987): 


FRMLe=y~-a,-ajh 
FRML h = a, + 01,(0.4e?, + 0.36254 0.2e2, + 0.1e2,) 


The first statement defines €, as the value of y, 
second statement defines the conditional variance. 


Finally, it is possible to include explanatory variables in.the formula for the con- 
ditional variance. In the GARCH(I, 1) inflation model, it is possible to write 


less the conditional variance. The 


FRML ht = 04, + O67, + Bia +B 


28) 


where z isan explanatory variable for h. 


1 
| 
i 


166 Modeling Economic Time Series: Trends and Volatility 


8. DETERMINISTIC AND STOCHASTIC TRENDS 


It is helpful to represent the general solution to a linear stochastic difference equa 
tion as consisting of the three distinct parts:'° 


y, = trend + seasonal + irregular 


We have examined how ARMA (p, q) techniques can be used to model the irreg- 
ular and seasonal components. GARCH and ARCH-M models try to capture the 
tendency of economic time series to exhibit periods of sustained volatility. The 
other distinguishing feature of Figures 3.1 through 3.8 is that the series appear to be 
nonstationary. The mean values for GNP and its subcomponents, the supplies of the 
financial instruments, and industrial production levels generally appear to be in- 
creasing over time. The exchange rate series shown in Figure 3.7 have no obvious 
tendency for mean reversion. 

For some series, such as GNP, the sustained upward trend might be captured by a 


simple linear time trend. Such an assumption is controversial, however, since itim- ° 


plies a deterministic long-run growth rate of the real economy. Adherents to the 
“real business cycle” school argue that technological advancements have perma- 
nent effects on the trend of the macroeconomy. Given that technological innova- 
tions are stochastic, the trend should reflect this underlying randomness. As such, it 
will be useful to consider models with stochastic and deterministic trends. 

A critical task for econometricians is to develop simple stochastic difference 
equation models that can mimic the behavior of trending variables. The key feature 
of a trend is that it has a permanent effect on a series. Since the irregular COmpo- 
nent is stationary, the effects of any irregular components will “die out” while the 
trending elements will remain in long-term forecasts. Examples of models with de- 
terministic trends include 


Y =A t alte, 
yp Hg tatta +e ta," +e, 


(linear time trend) 
(polynomial time trend) 


Either of these equations can be augmented with lagged values of the {y,} se- 
quence and/or the {€,} sequence. However, models with stochastic trends are prob- 
abily less familiar to you. The remainder of this section develops time-serics mod- 
els exhibiting a stochastic trend. 


The Random Walk Model 


Let the current value of y, be equal to last period’s value plus a white-noise term: 
Y= Ym tE, (or Ay, = €,). 


The random walk model is clearly a special case of the AR(1) process y, = ay + 
ayy, + €, when ay = 0 and a, = H Suppose you were betting on the outcome of a 


Deterministic and Stochastic Trends 167 


coin toss, and a head added $1 to your wealth while a tail cost you $1. We could let 
=+$1 if a head appears and —$1 in the event of a tail. Thus, your current wealth 
(y,) equals last period’s wealth (y,_,) plus the realized value of e,. If you play again, 
your wealth in? + 1 is yu, =Y, + Enr 
If yy is a given initial condition, it is readily verificd that the gencral solution to 
the first-order difference equation represented by the random walk model is 


T 
Y= +5 
f isl 


Taking expected values; we obtain E(y,) = EO) = Yo; thus, the mean of a ran-. 
dom walk is a constant. However, all stochastic shocks have nondecaying effects 
on the {y,} sequence. Given the first ¢ realizations of the {e€,} process, the condi- 
tional mean of y,,, is 


Eyre = Ey, + Eni) =y, 


Similarly, the conditional mean of y,,, (for any s > 0) can be obtained from 


S 
Yas =Y + Yeni 
i=} 


so that 


E Yas =I, + Ey > eni =, 


The conditional means for al! values of y,,, for all positive values of s are equal 
to y, However, an e, shock has a nondecaying effect on the (y,} sequence so that 


the {y,} sequence is permanently influenced by an e, shock. Notice that the variance 
` is time-dependent. Recall that 


Var(y,) = var(e, + €; + +€) = 10" 


_ $0 


Var(y;-5) = Var(Esg + Eg + + €) = (1 5)O" 


Since the variance is not constant [i.e., var(y,) # var(y,.,)], the random walk 
process is nonstationary. Morcover, as t — œ, the variance of y, also approaches in- 
finity. Thus, the random: walk meanders without exhibiting any tendency to in- 


3$ 
£ 
T 
f 


168 Modeling Economic Time Series: Trends and Volatility Deterministic and Stochastic Trends 169 
crease or decrease. It is also instructive to calculate the covarianeé of y, and y,, Figure 3.12 ned ee e (a) Random walk model. (b) Random walk plus 
Since the mean is constant, we can form the covariance ¥,_, as l rift, (c) Random walk plus noise. (d) Local lincar trend model. 

S 150 


EIQ, = YODO g Yo) = E[s, FEL toot €)(€,, FEL hee + €) 
= Elle) + (e a)? +o + (€1)7I 
=(t-s)o” 


100 |- © 


To form the correlation coefficient p,, we can divide y,_, by the product of the 
standard deviation of y, multiplied by the standard deviation of y,_,. Thus, the come. 
lation coefficient p, is 


Pp, = (0 — s) NU — S) 
= 10- sy 


(3.33) 0 50 100 


la} (b) 
This result plays an important role in the detection of nonstationary series. For 


the first few autocorrclations, the sample size t will be large relative to the number! 


8 
of autocorrelations formed; for small values of s, the ratio (t — s)/t is approximatel ica 
equal to unity. However, as s increases, the values of p, will decline. Hence, in us 6 
ing sample data, the autocorrelation function for a random walk process will show. 600 j~ 
a slight tendency to decay. Thus, it will not be possible to use the ACF to distin a 4 
guish between a unit root processes (a, = 1) and processes such that a, is close 2 400 
unity. In the Box—Jenkins identification stage, a slowly decaying ACF or PACF can F 
be an indication of nonstationarity. oe 0 ; 
vie (a) in Figure 3.12 shows the time path of a simulated randori walk 3 200 
-4 3 0 
0 50 100 


{c} {d) 


tive trend in the simulated ada The reason foe the upward trend is that the one 
values of the deviates used in this small sample of 100 do not precisely conform 19: 


Hence, the constant valuc of y, is the unbiased estimator of all future values of 


ee process is false and serves as a reminder against relying solely on causal i J Jm for all s > 0. To interpret, note that an €, shock has a permanent effect on y,. The 
spection. f 


‘impact multiplier of e, on y, (i.e., dy,/de,) is the same as the multiplier of e, on all 
Yu This permanence is directly reflected in the forecasting function for y,,,. In the 


The Forecast Function time-series literature, such a sequence is said to have a stochastic trend since the 


Suppose you collected a sample of values yọ through y, and wanted to forecast fu- : expression Ze; imparts a permanent, albeit random, change in the conditional mean < 
ture values of the data series. From the perspective of time period 1, the optimal “$ of the series. Note that the random walk model scems to approximate the behavior 
forecast of y,,, is the mean value of y,,, conditioned on the information available at £ of the exchange rates shown in Figure 3.7. The various exchange rate series have 


‘No particular tendency to increase or decrease over time; neither do they exhibit any 


Eya =y, tendency to revert to a given mean value. 


170 Modeling Economic Time Series: Trends and Volatility Deterministic and Stochastic Trends 171 


The Random Walk plus Drift Model E Yrs =Y, + dps 
Now Ict the change in y, be partially deterministic and partly stochastic. The ran- 
dom walk plus drift model augments the random walk model by adding a constant 
term ag: 


In contrast to the pure random walk mod 


el, the forecast function is not flat. The 
fact that the mean change in y, is alwa 


ys the constant ag is reflected in the forecast. 
In addition to the given value of Yn We project this deterministic change s times into 
© the future. Thus, the model does not contain an irregular component; the random 
: walk plus drift contains only a deterministic trend and stochastic trend. , 


ESET * MNES 


Yr = Yer + Gy + €, 636) 


Fi 


one 


Given the initial condition yg, the general solution for y, is: 


i 


: The Random Walk plus Noise Model 


Y= Yo tage + De; gon 


i=l 


In the random walk plus noise model, y, is the sum of a stochastic trend and white- z 
norse component. Formally, this third model is represented by E : 


y=, +7, > ~ > (3,38) 
Here, the behavior of y, is govemed by two nonstationary components: a lincar 
deterministic trend and the stochastic trend Ee, If we take expectations, the mean and 
of y, is Yo + aot and the mean of y, is Ey,,, = Yo + aol! + s). To explain, the deter- E SE : ‘ 
ministic change in each realization of {y,} is ag; after ¢ periods, the cumulated HSH te (3.39) ; 
change is aot. In addition, there is the stochastic trend Ee; each €; shock has a per- 


manent effect on the mean of y,. Notice that the first difference of the series is sta- 
tionary; taking the first difference yields the stationary sequence Ay, = dy + €,. 
Graph (b) of Figure 3.12 illustrates a simulated random walk plus drift model. 
The value of ay was set equal to unity and (3.37) simulated using the same 100 de- 
viates used for the random walk model above. Clearly, the deterministic time trend 
dominates the time path of the series. In a very large sample, asymptotic theory 
suggests this will always be the case. However, you should not conclude that it is 
always casy to discern the difference between a random walk model and a model 
with drift. In a small sample, increasing the variance of {€,} or decreasing the ab- 
solute value of ap could cloud the long-run properties of the sequence. Notice that 
the pattern evident in the random walk plus drift model looks strikingly similar to 


many of the series—including the money supply and real GNP—shown in Figures 
3.1 through 3.8. 


where {1,} is a white-noise process with variance = On) and €, and n,_, are indc- 


pendently distributed for all ¢ and s li-e., E(e, n,.,) = 0}. 


It is easy to verify that the {t,} sequence represents the stochastic tre 


t is « M nd. Given 
the initial condition for Ho, the solution for p, is ` 


a ARTENE ET EE iaa 


t 
B, = Hy + DF €; 
= 


Combining this expression with the noise term yields 


i 
J, = Beg +6, +7, 
The Forecast Function i=l 


Updating (3.37) by s periods yields Now recognize that in period zero, the valuc of Yo is given by yp = Ho + Ny, SO 
fe that the solution for the random walk plus noise model can be written as 


tts 


Yis = Yq Fg (4+ 5) + Ye, i 
ii Yi =Yo-Ny +> etn, (3.40) 
5 i=l ` i 
=y, tags +Ý €,,; 
i=] 


The key properties of the random walk plus noise model are as follows: 


. The unconditional mean of the {y, 


} sequence is constant: EY, = Yo — No and up- 
dating by s periods yields Ey,,, 


Taking the conditional expectation of y,,,, we get = Yə — No. Notice that the successive €, shocks 


172 Modeling Economic Time Series: Trends and Volatility 


have permanent effects on the {y,} sequence in that there is no decay factor on 
past values of €,_;. Hence, y, has the stochastic trend p,. 


2. The {y,} sequence has a pure noise component in that the {n,} sequence has 


only a temporary effect on the {y,} sequence. The current realization of n, af- 
fects only y, but not the subsequent values y,,,. 


3. The variance of {y,} is not constant: var(y,) = 16? + 6? and var(y,.,) =(f—s)o?+ 


G,. As in the other models with a stochastic trend, the variance of y, approaches 
infinity as f increases. The presence of the noise component means that the cor- 
relation coefficient between y, and y,_, is smaller than for the pure random walk 
model. Hence, the sample correlogram will exhibit even faster decay than in the 
pure random walk model. To derive this result, note that the covariance between 
y, and y, is 


Cov(y, Y,- = El,- Yo + NO. = Yo No] 
HEl(e, + Ez + eg + + E, tn ley te test +e, +7,,)] 


Since {€,} and {n,} are independent white-noise sequences, 


Cov(y,, Yra) =(t- s)o? 


Thus, the correlation coefficient p, is 


(t -s)o? 


CS ey oe 
Juo? +02 [t= s)0? +07] 


Comparison with (3.35)—that is p, for the random walk model—verifies th 


the autocorrelations for the random walk plus noise model are always smaller for- 


G2 >0. 
"Consi graph (c) of Figure 3.12 that shows a random walk plus noise mode 
The series was simulated by setting No = 0 and drawing a second 100 normally dis 
tributed random deviates to represent the n, series. For each value of t, n, — Ny 
was added to the value of y, calculated for the random walk model. If we compare. 
parts (a) and (c) of the figure, it is seen that the two series track each other qui 
well. The random walk plus noise model could mimic the same set of macroccos 
. nomic variables as the random walk model. The effect of the “noise” component, 
{n,}, is to increase the variance of {y,} without affecting its long-run behavior. 
After all, the rando n walk plus noise series is nothing more than the random wall 
mode] with a purely temporary component added. a. 


The Forecast Function 


To find the forecast function, update (3.40) by s periods to obtain 


¥ 


4 
Ea 


Its 
Jas TIe + >. GN 


i=] 
a 
=y} 71, + X Git Nas 
i=! 
Taking the conditional expectation, we get 


EWees =W N, 


‘Thus, the random walk plus noise model cont 


component. Certainly, n, has Only a temporary effect on yp the forecast of y, is the 
; current value y, less the temporary component n, e 
{y,] is the stochastic trend Le,. 
As an exercise, it is useful to show that the random w 


alk plus noise model can 
t also be written in the form 


Ye Vey FE, + Te Mea 


The proof is straightforward since (3.38 
Ay, = Au, + An,. Given (3.39), Apt, =€ 
are equivalent. 


) can be written in first differences as 
n 50 that Ay, = €, + An,. Hence, the two forms 


The General Trend plus Irregular Model 


The random walk plus noise and random w 
blocks of more complex time-series models. 
ponents can easily be incorporated into 
that the trend in y, cont 
replace (3.39) with 


alk plus drift model are the building 
For example, the noise and drift com- 
| a single model by modifying (3.39) such 
3 ains a deterministic and stochastic component. Specifically, 
l H = Hi tay +e, (3.41) 


= aconstant 
a white-noise process 


CS 
Y 
a 

S 

I 


a 

fn 

Z 
i 


ilere, the trend p, contains the stochastic ch 
SRU establish this point, use (3.37) to obtain the 


Do 
et 


ange e; and deterministic change up. 
Lt, as 


$ t g 
H, = Hy tatt $ ¢, ; 
i=l f 


Deterministic and Stochastic Trends 173 


ains both a trend and an irregular 


The permanent component of 


=r 


Bem wt ess 


me ed 


~~ 


174 Modeling Economic Time Series: Trends and Volatility 


Now combine the deterministic and stochastic trends with the noise term to ob- 
tain 


t 
Y, = Mo Fapt + X SHN, (3.42) 


i=l 


If we impose the initial condition yọ = Ho + No, the solution for y, is 


; ; 
Y, = Yo My tagt + Se +7, = ~ G43) 


i=] 


Equations (3.38) and (3.41) are called the trend plus noise model; y, is the sum 
of a deterministic trend, stochastic trend, and pure white-noise term. Of course, the 
noise sequence does not need to be a white-noise process. Let A(L) be a polynomial 
in the lag operator L; it is possible to augment a random walk plus drift process 


with the stationary noise process A(L)n,, so that the general trend plus irregular 
model is 


f 
Y, = py tat + Ye, + A(L)n, (3.44) 


i=] 


The Local Linear Trend Mode! 


The local linear trend model is built by combining several random walk plus noise’ 


processes. Let (€,}, (1,}, and {6,] be three mutually uncorrelated white-noise 
processes. The local linear trend model can be represented by 


¥ =H, +N, 
H, = Hi $ a, + €, 


a,=a,_, +Ô, : f (3.45): 


The local lincar trend model consists of the noise term n, plus the stochastic 
trend term H,. What is interesting about the model is that the change in the trend is 


a random walk plus noise: that is, Alt, is equal to the random walk term a, plus the . 
noise term €,. Since this is the most detailed model thus far, it is useful to show that 


the other processes are special cases of the local lincar trend model. For example, 


1. The random walk plus noise: If all values of the {a,} sequence are equal to 
zero, (3.45) is a random walk (H, = H,., + €,) plus noise (n,). Let var(8) = 0, so 


that a, = a,i = = dp. If ag = 0, U, = Hy, + En so that y, is the random walk p, 
plus the noise term n, 


in the {y,} sequence cont 
ular term. The stochastic 


‘alizations of the {8, 


Deterministic and Stochastic Trendy 175 


2. The random walk plus drift: Again, let var(5) = 0, so that a, 


: = yy = = do 
Now if ay 


differs from zero, the trend is the random walk plus drift: y, = pt, + 
ay + €, Thus, (3.45) becomes trend plus noise model. If we further restrict the 


model such that var(n) = 0, the model becomes the pure random walk plus drift -. 
model. 


The solution for y, can casily be found as follows. First, solve for a, as 


Next, use this solution to write p, as 


By = Bay +a + à; +e 


i=} 


so that 


t Š 
By = bg + >; +Ilag +8.) +8, (1-1) 48,12) + eseb, 


i=l 


Since yo = Ho + No, the solution for y, is 


Y =+, = 1g) + D1 6 + Hdy +8) 40-8, +U -2); + tS, 
i=| i 


Here, we can sce the combined properties of all the other models. Each clement 


ains a deterministic trend, a stochastic trend, and an irreg- 
trend is Ze; and the irregular term N,--Of course, in a more 


+ > general version of the model, the irregular term could be given by A(L)n,. What-is 


most interesting about the model is the form of the determinis 


t bout tic time trend, Rather 
than being deterministic, the coefficient on time depends-on the current and past re- 


} sequence. If in period z, the realized value of the sum ay + 
Nae 5, happens to be positive, the coefficient of t will be positive. Of course, 
this sum can be positive for some values of t and negative for others. The simulated :. 
local linear trend model shown in graph (d) happens to have a sustained positive 


slope since there were more positive draws in the 100 valucs of {5,} than negative 
values. 


+e 


eee 
PERE ge gene es 


i 
' 
, 
$ 


176 Modeling Economic Time Series: Trends and Volatility 


Removing the Trend 7. 
The Forecast Function 


E(Ay,) = Lay + E) = ay 


If we update the solution for y, by s periods, it is simple to demonstrate that Var(Ay,) = MAy, = a, = Ele, =o? 


and 
FFX 


Vine = Yo t (Mas To) Ya +(t+s)(ay +Ò )+(t+s—-1d, +U +s- 2) 
i=l i 


Cov(Ay,, AY, s) = ET(Ay, — NAY, s — a)] = Ele) =0 
+e t Òu ni 
ý Since the mean and variance are const 


AY; depends solely on s, the {Ay,} sequence is stationary. 
The random walk plus noise model is 


ences, the model can.be written as A 
that Ay, 


ants and the covariance between Ay, and i 
so 


an interesting case study. In first differ- 
1 = €, + An,. 


In this form, it is casy to s 
i is ‘ HUIS casy to she 
1s stationary. Notice the following: : = 


Yas =), RNs =n) Sei + 5(a +8, +8, Est +8,)+ ) s+ l -DS 
i=l i=l E(Ay,) = Ele, + An,) =0 
E(Ay,)* = Elle, + An,)°] 
= El(e,)? + 2e, AN, + (An,)*} 
= 07 + 2E(e,An,) + EU)? = 291.4 + (1)2) = 0? + 262 
Cov(Ay,, Ay) = El(e, + N, = Mey + N-i = Nya) i 


Taking conditional expectations yields Var(Ay,) 


Ii 


E Yrs at Y, a N) + slao + 8, + öz Kek 3) 


The forecast of y,,, is the current value of y, less the transitory component i), plus 


= -07 
s multiplied by the slope of the trend term in ¢. 


Cov(Ay,, Ay.) = El(e, F n, ci Nai ME, + Ths 


=N= 0 k 
9. REMOVING THE TREND Nsw] fors2 | 


If we set s = 1, the correlation coefficient between Ay, and Ay, is - 
You have seen that a trend can have deterministic and stochastic components. The y F 
form of the trend has important implications for the appropriate transformation to 
attain a stationary series. The usual methods for eliminating the trend are difference 
ing and detrending. Detrending entails regressing a variable on “time” and savit 
the residuals.” We have already examined an ARIMA (p, d, q) model in which 4 


dih difference of a series is stationary. The aim of this section is to compare thes. | 
two methods of eliminating the trend. : 


WD= cov(Ay,, Av.) 2 2 2 > 
pa) var(Ay,) ` ~Onl(O" + 2G) 


Examination reveals —0.5 < p(1) <0 and th 
UTO. Since the first difference of Y, acts ex 
Bewalk plus noise model js ARIMA(O, 1, 
w effect on the correlogram, it additiona 
of (3.43) also acts as an ARIMA(O, I, 
‘The local linear trend model acts a 
and second difference of y, 


at all other correlation coefficients are 
actly as an MA(1) process, the random 
1). Since.adding a constant to a series has 
Hy follows that the trend plus noise model 
1) process. 


del s an ARIMA(O, 2, 2) model. Taking the first 
in this model, we obtain 


Differencing 


First consider the solution for the random walk plus drift model: 


t 
Y, = Yo tagt+ 96; Ay, = Au, + An, 
ist =a, + €,+ An, 
Taking the first difference, we obtain Ay, = ag + €, Clearly, the {Ay,} sequence— 


equal to a constant plus a white-noise disturbance—is stationary. Viewing Ay, as 
the variable of interest, we have : 


A*y, = Ad, + Ac, + An, 
= 6, + Ae, + A’n, 


$ CERY Removing the frend 179 
178 Modeling Economic Lune Sertes Frends and Vohaality 


it is strghtlorward to show that the first differ- Detrending 


Since a, itself is nonstationary, 
ence of y, is not stationary. Examining A?y,. we note We have shown that differencing can sometimes be used to transform a nonstation- 
ary model into a stationary model with an ARMA representation. This does not 
-mean that all nonstationary models can be transformed into a well-behaved ARMA 
‘model by appropriate differencing. Consider, for example, a model that is the sum 


ci Of a deterministic trend component and pure noise component: 
KS, i 


E(A*y,) = Et, + Se, + n=O 


VarlA2y,) = E(S, + Ae, + ANY = ENOS + Sea + (ane + oe od 
Cire 2S Ne. + WMH. + WA) 


ie 3 


=o +26 -65i — . , 

Cont Ary. Ay.) = END. = See STO, = Me AT 
=E eeN Mra 

OEE E Get Nae M + Neal 

= oc = Jo: R B s 

Covi Ay, A?y,_9) = ECS, + de, + ANO, -2 + AE + A) 
= E[(8, +E Eni +N 2n,-1 +N) ; 

(O,.9 + € 2 7 Gy F ia Myt um) 


Y= Yo t Ql +e, 


? 


Sie 
à 


7 34 autoregressive process. Recall that the invertability of a stationary process re- 
=O- a 
n poe 


ires that the MA component not have a unit root. 
; Instead, an appropriate way to transform this model is to estimate the regression 
: equation Y, = O + Of + €, Subtracting the estimated values of y, from the observed 
series yiclds estimated values of the {¢,} series. More generally, a time series may 
ei kave the polynomial trend 


acts as an ARIMA(O, 2, 2). You should be able to show that the correlogram is s 

that ~2/3 < p(1) $0.0 < p(2) < 1/6 and ail other values of p(s) are zero. 
Now consider a general class of ARIMA(p, d, q) models: 

y= dy t yl HUP H e + 0,0" +e, 

A(L)y, = B(L)e, 

= a stationary process 


unit circle.'? We can factor A(L) into (1 -- L)A*(L) where A*(L) is a polynomial of 
order p — 1. Since A(L) has only one unit root, it follows that all roots of A*(L) aw 


: ee f ression equation using the largest value of n deemed reasonable. 
outside of the unit circle. Thus, we can write (3.46) as reg que 8 8 


f the t-statistic for a, is zero, consider a polynomial trend of order n — 1. 
inue to pare down the order of the polynomial trend until a nonzero coefficient’ 
s found. F-tests can be used to determine whether group coefticients—say, 
hr gh O,—are statistically different trom zero. The AIC and SBC statistics 

be used to reconfirm the appropriate degree of the polynomial. 

‘difference between the estimated/values of the {y,} sequence from the actual 


fields an estimate of the stationary sequence (¢,]. The detrended process 


Q= DAS y= BILY- 
Now define vF = Av so that 
A*(L)\* = B(L)e, 


The {y*} sequence is stationary since all roots of A*(L) lie outside the un ae 
‘The point is that the first difference of a unit root process is stationary. If (QE 
two unit roots, the same argument can be used to show that the second diffe 
{y,} is stationary. The general point is that the dth difference of a process } 
unit roots is stationary. An ARIMA(p, d. q) model has d unit roots; the dihi 
ence of such a model is a stationary ARMA(p, q) process. If a series has: 
roots, it is said to be integrated of order d or simply Kd). 


ence Versus Trend Stationary Models 


oh 
e encountered two different types of nonstationary time-series models: 
vith a stochastic trend and those with a deterministic trend. The economic 


talents 


aoe 7 


— 


180 Modeling Economic Time Series: Trends and Volatility 


formed into a stationary model by removing the deterministic trend. A serious 
problem is encountered when the inappropriate method is used to eliminate Teena, 
We saw an example of the problem in attempting to difference the equation: y, = 
Vo + @ 1 + €. Consider, a trend stationary process of the form 


A(L)y, = Og + OF He, 


where the characteristic roots of the polynomial A(L) are all outside the unit circle 
and the expression c, is allowed to have the MA form e, = B(L)e,. Subtracting an T 
timate of the deterministic time trend gives a stationary and invertible ean 
model. However, if we use the notation of (3.47), the first difference of Suc a 
model yields 


A(L)y* = 0, + (1 — L)B(L)e, 


First-differencing the TS process has introduced a noninvertible unit root sea 
into the MA component of the model. Of course, the same problem is neomnes 
into é with a polynomial time trend. 
oe way Gln a deterministic time trend from a difference me 
ary process is also inappropriate. In the random walk plus drift model above, = 
tracting Yo + dyt from each observation does not result ina stationary series sinc 
the stochastic trend is not eliminated. More generally, incorporating a deterministic 
trend component in a regression when none exists results ina ia ae oe 
if the process actually contains a unit root. You might be tempted to think it possi 
ble to estimate the deterministic trend from the data using a such regression. 
Unfortunately, all such coefficients are statistical artifacts in the presence of a non- 
stationary error term. 


The Yen/Dollar Exchange Rate: An Example 


The random walk shown in Figure 3.12 might fool a researcher into thinking the se- 
_ rics is actually trend stationary. Instead of focusing on simulated data, ore the 
time path of the yen/dollar exchange rate illustrated in Figure 3.7. es the y 
rose by more than 60% during the 21-ycar period 1971 through 1991. ee 
theory suggests no reason to expect the nominal yen/dollar rate to have a one 
istic component; in fact, some versions of the cfficient market hypothesis L l 
that the yen/dollar rate must have a stochastic trend. However, It Is ae : 
consider the consequences of detrending the yen/dollar rate. If y, denotes t e 
yen/dollar exchange rate, regressing y, On a constant and time yields i 
y, = 0.8479 — 0.0064 time + e, 
© (4491) (-14.16) 


The r-statistics (shown in parentheses) indicate that the coefficients are E 
significant, The residuals from this regression—the {e,} sequence—are the f 


See ene Tee ee era o e n 


sede eetee ee 


Are There Business Cycles? 181 


trended values of the yen/dollar exchange rate. The top portion (a) of Figure 3.13 
. Shows the ACF and PACF of the detrended exchange ratc; as you can clearly see, 
the ACF does not die out after 16 quarters! Here, detrending the data does not result 
in a stationary series. The lower portion (b) of the figure shows the ACF and PACF 
of the logarithmic change in the yen/dollar rate. The single spike at lag | is sugges- 
‘tive of an AR(1) or a MA(1) model. The negative correlation coefficients at lags 5 
and 15 do not suggest any particular seasonal patterns and may be spurious. With 
70 usable observations, 27~"”? = 0.239 is almost exactly equal to the PACF coeffi- 
` cient at lag 5. The results of two alternative estimations of the logarithmic change 
‘in the yen/dollar rate are shown in Table 3.1. For both models, the estimated inter- 
* cept (ao) is not statistically different from zero at conventional significance levels. 
The Q-statistics for autocorrelations up to 17 (7/4 = 17) show that as a group, all 
can be treated as being equal to zero. However, by a small margin, the SBC selects 
the MA(1) model. The critical point is that cither of these models using differenced 
data will be vastly superior to a model of the detrended yen/dollar rate, 


10. ARE THERE BUSINESS CYCLES? 


Traditional business cycle research decomposed real macroeconomic variables into 
a deterministic secular trend, a cyclical, and an irregular component. The typical 
decomposition is illustrated by the hypothetical data in Figure 3.14. The secular 
trend, portrayed by the straight line, was deemed to be in the domain of growth the- 
ory. The slope of the trend line was thought to be determined by long-run factors 
such as technological growth, fertility, and educational attainment levels. One 
Source of the deviations from trend occurs because of the wavelike motion of real 
economic activity called the business cycle. Although the actual period of the cycle 
". was never thought to be as regular as that depicted in the figure, the periods of 

prosperity and recovery were regarded to be as inevitable as the tides. The goal of 
monetary and fiscal policy was to reduce the amplitude of the cycle (measured by 
distance ab). In terms of our previous discussion, the trend is the nonstationary 
; component of growth and the cyclical and irregular components are stationary. 


rend 


Table 3.1 Alternative Estimates of the Yen/Dollar Exchange Rate | 


Estimates* Q(17) Statistic? SBC 
a See he 
; A AR(1) ay: —0.0104 (0.0095) Q(17) = 19.06 (0.3249) ~—114.359 
ae a: 0.3684 (0.1148) 
MA(1) ay: —0.01 16 (0.0082) Q(17) = 19.22 (0.2573) -114.932 | 


Bi: 0.3686 (0.1123) 
"Standard errors are in parentheses. 
‘114 is approximately 17. 


‘Since both models have the same number of parameters, both the AIC and SBC select the same model. 


oe 


Bo tae enka 


ee 
LRR CAT mee 


coe hanes 


ORRERI 


iaa 


Crain yt: 


rere heen Pe) 


Tad 


182 Modeling Economic Time Series: Trends ant Volatility 


Figure 3.13 


ACF and PACF of the detrended yen. 


1 j 


ie Til 


ees een ee (Sm EN VETE CPES EA ape eet AP ene comets Ce cee 
0123 4 5 67 8 9 10 11 12 13 14 15 16 


u ACF UO PACF 


~0.5)- 


(a) Phage a pe Ay 


ACF and PACF of the 
1 logarithmic change in the yar, 


0.5 |- 


Jp le et oe te 
0 12 3 4 5 6 7 8 9 10 11 12 13 14 15 16 


B ACF 0 PACF 


tb} 


Although there have been recessions and periods of high prosperity, the 
post-World War II experience taught us that business cycles do not have a regular 
period. Even so, there is a widespread belief that over the Jong run, macroeconomic 
variables grow at a constant trend rate and that any deviations from trend are even- 
tually eliminated by the “invisible hand.” The belief that trend is unchanging over 
time leads to the common practice of “detrending” macroeconomic data using a lin- 
ear (or polynomial) deterministic regression equation. 


i 
a S 
i a 
| 


Are There Business Cycles? 183 


This detrending procedure might entail cstimating real GNP using the regression 
Y= Oy + Of + €,. The calculated residuals are the detrended data. Subtracting the 
trend from each observation might yield something similar to the lower graph of 
Figure 3.14; the deviations from the cycle are the irregular components of the sc- 
ries. If the residuals are actually stationary, the cyclical and irregular components 
can be fitted using traditional means. 

The problem with this type of analysis is that the trend may not be deterministic. 
As we have seen, it is improper to subtract a deterministic trend from a difference 
stationary series. The economic significance of real macroeconomic variables being 
difference stationary, rather than trend stationary, is profound. If a variable is trend 


‘stationary, current economic shocks of any variety will not have any long-run cf- 


fects on the series. Consider the forecast function from the trend stationary model 


Figure 3.14 The business cycle? 
200 


150|- 


100 }- 


pi 7 \ | Cycle 


Ween TH ama See” 


Scant: 


a been © 


ta ocean! 


184 Modeling Economic Time Series: Trends and Volatility 


Y, = Ay + Of + €, above. If e, is a white-noise process, the forecast of Ypes IS Oo + 
o,(t + s) for all s; neither current nor past events affect the very long-run forecast of 
the future y values. More important, given the values Q and 04, the forecast error 
variance is constant. The forecast error for any s is always €,,,; hence, the forecast 
error variance for any s is var(e,,,). Even if {€,} is serially correlated, long-term 
forecasts will eventually depend only on Gy and 0., and the forecast horizon (s). 

This is in stark contrast to the case in which the {y,} series has a stochastic trend. 
Consider the simple random walk plus noise model y, = H, + Ni, where H, = Hi + € 
Given the initial condition for y,, we can solve for y,,, as 


5 
Yis = yY, DION FNs N, 
iat 


Notice that the forecast error variance becomes unbounded for long-term fore- 
casts. The s-step ahead forecast of y, 1S 3 E ET 
Eres =F =y; 
so that the s-step ahead forecast crror variance is 


` 


Var( Yas = E Yik) = var Yeni FNs 


i=l 


2 2. 
= 50° + Gh 


As we forecast further into the future, the confidence interval surrounding our 
forecasts grows progressively larger. As s — œ, the variance of the forecast error 
becomes infinitely large. 

Nelson and Plosser (1982) challenged the traditional view by demonstrating that 
important macroeconomic variables are DS rather than TS processes. They ob- 
tained time-series data for 13 important macrocconomic time series: real GNP, 
nominal GNP, industrial production, employment, unemployment rate, GNP defla- 
tor, consumer prices, wages, real wages, money stock, velocity, bond yields, and an 
index of common stock prices. The sample began as early as 1860 for consumer 
prices to as late as 1909 for GNP data and ended in 1970 for the entire series. Some 
of their findings are reported in Table 3.2. The first two columns report the first- 
and second-order autocorrelations of real and nominal GNP, industrial production, 
and the unemployment rate. Notice that the autocorrelations of the first three series 
are strongly indicative of a unit root process. Although p(1) for the unemployment 
rate is 0.75, the second-order autocorrelation is less than 0.5. 

First differences of the series yicld the first- and second-order sample autocorre- 
lations r{1) and r(2), respectively. Sample autocorrelations of the first differences 
are indicative of stationary processes. The evidence supports the claim that the data 


Stochastic Trends and Univariate Decompositions 185 


Table 3.2 Selected Autocorrelations from Nelson and Pljosser 


p(1) p(2) r(1) r(2) d(1) ` d(2) 
Real GNP 0.95 0.90 0.34 0.04 0.87 0.06 
Nominal GNP 0.95 0.89 0.44 0.08 ` 0.93 0.79 
Industrial production 0.97 0.94 0.03 -0.11 0.84 0.67 
Unemployment rate 0.75 0.47 0.09 — -0.29 0.75 0.40 


Notes: 1. Full details of the correlogram can be obtained from Nelson and Plosser (1982) who report 
the first six sample autocorrelations. 


2. Respectively, p(i), r(i), and d(i) refer to the ith-order autocorrelation coefficient of cach se- 
ries, first difference of the series, and detrended values of the series. 


are gencrated from DS processes. Nelson and Plosser point out that the positive au- 
tocorrelation of differenced real and nominal GNP at lag | only is suggestive of an 
MA(1) process. To further strengthen the argument for DS-gencrating processes, 
recall that differencing a TS process yields a noninvertible moving process. Nonce 
of the differenced series reported by Nelson and Plosser appear to have a unit, root 
in the MA terms. 

The results from fitting a lincar eel to the data and forming sample autocorrela- 
tions of the residuals are shown in the last two columns of the table. An interesting 
feature of the data is that the sample autocorrelations of the detrended data are rea- 
sonably high. This is consistent with the fact that detrending a DS series will not 
eliminate the nonstationarity. Notice that detrending the unemployment rate has no 
effect on the autocorrelations. 
>: Rather than rely solely on an analysis of correlograms, it is possible to formally 
test whether a series is difference stationary. We examine such formal tests in the 
next chapter. The testing procedure is not as straightforward as it might seem. We 
cannot use the usual statistical techniques since classical procedures all presume that 
the data are stationary. For now, it suffices to say that Nelson and Plosser are not 
able to reject the null hypothesis that their data are DS. If this view is correct; macro- 
economic variables do not grow al a smooth long-run rate. Some macroeconomic 
shocks are of a permanent nature; the effects of such shocks are never climinated. 

“| 


= 1. STOCHASTIC TRENDS AND UNIVARIATE 
DECOMPOSITIONS 


` a stochastic trend and an irregular componcnt. Having observed a series, but not the 
individual components, is there any way to decompose the series into the con- 


Nelson and Plosser’s (1982) idine suggest that many economic time series have ` 


pee Antes mii on arog Fey Ngee es ate T 


aN eee EARTH ARES A 


4 
2 
3 
@ 


ean 


porera] 


mataran Renate eae 


Steet ane 


Pee ce 


186 Modeling Economic Time Series: Trends and Volatility 


stituent parts? Numerous economic theories suggest it is important to distinguish 
between temporary and permanent movements in a series. A sale (i.e., a temporary 
price decline) is designed to induce us to purchase now, rather than in the future. 
Labor economists argue that “hours supplied” ts more responsive to a temporary 
wage increase than a permanent increase. The idea is that workers will temporarily 
substitute income for leisure time. Certainly, the modern theories of the consump- 
tion function that classify an individual’s income into permanent and transitory 
components highlight the importance of such a decomposition. 

Any such decomposition is straightforward iI it is known that the trend in {y,} is 
purcly deterministic. For example, a linear time trend induces a fixed change each 
and every period. This deterministic change can be subtracted from the actual 
change in y, to obtain the change resulting from the irregular component. If, as in 
Section 9, there is a polynomial trend, simple detrending using OLS will yield the 
irregular component of the series. 

A difficult conceptual issue arises if the trend is stochastic. For example, suppose 
you are asked to measure the current phase of the business cycle. If the trend in 
GNP is stochastic, how is it possible to tell if the GNP is above or below trend? 
The traditional measurement of a recession by consecutive quarterly declines in 
real GNP is not helpful. After all, if GNP has a trend component, a negative realiza- 
tion for the irregular component may be outweighed by the positive trend compo- 
nent. . 
If it is possible to decompose a sequence into its separate permanent and station- 
ary components, the issue can be solved. To better understand the nature of sto- 
chastic trends, note that—in contrast to a deterministic trend—a stochastic trend in- 
creases on average by a fixed amount cach period, For example, considéf the 
random-walk plus drift model of (3.36): ; 


YF Yi + do + € 


Since Ee, = 0, the average change in y, is the deterministic constant dg. Of 
course, in any period #, the actual change will differ from ay by the stochastic quan- 
tity e. However, each sequential change in {y,} adds to its level, regardless of 
whether the change results from the deterministic or stochastic component. As we 
saw in (3.37), the random walk plus drift model has no irregular component; hence, 
it is a model of pure trend. 

The idea that a random walk plus drift is a pure trend has proven especially use- 
ful in time-series analysis. Beveridge and Nelson (1981) show how to decompose 
any ARIMA(p, 1, q) model into the sum of a random walk plus drift and stationary 
component (i.¢., the general trend plus irregular model). Before considering the 
general case, begin with the simple example of an ARIMA(O, 1, 2) model: 


Y= Mar + ay + E, + Bie + Bre-2 (3.48) 


if B, = B, = 0, (3.48) is nothing more than the pure random walk plus drift 
model. The introduction of the two moving average terms adds an irregular compo- 


Stochastic Trends and Univariate Decompositions 187 


nent to the {y,} sequence. The first step in understanding the Beveridge and Nelson 
(1981) procedure is to obtain the forecast function. For now, keep the issue simple 
by defining e, = €, + B,¢,_, + Bye», so that we can write y, = Yet + dy + e, Given an 
initial condition for yp, the general solution for y, is?” 


pets 


` 
Y= At + Vy + > ~ o O BA) 

i=] f . “ee l 
Updating by s periods, we get 


tts 


Yis = lo (t+s)+ Yo + y ei . (3.50) 


i=] 


Pegs 


Substituting (3.49) into (3.50) so as to eliminate Yo yields “ 


S 
Yeas = OSY, +Y Crys ce tbe (3.51) 


i=} 


To express the solution for Yis 1 terms of {€,} rather than (e,}, note that 


5 


S S s 
Yeni = $ eni +B Yen + Ba D Erzsi 
i=l i=l i=l 


i= 


; (3.52) 
so the solution for y,,, can be written as 


s y 


AY 
Yis = AS + y, + ei +B, Sec +B > Enasi (3.53). 


i=] i=l i=] 


- Now consider the forecast of y,,, for various values of s. Since all values of 
E€,,;= 0 fori > 0, it follows that 


E Yni = ag +y, + Bye, + Boe, 
Le) = 2a ty, + B + Bae, + B€ 


Ess = Say + yı + Bı + Bae, + Boe, ‘ (3.54) 


Here, the forecasts for all s > 1 are equal to the expression Suo + y, + (B; + Bae, + 
Brrr Thus, the forecast function converges to a linear function of the forecast 
horizon s; the slope of the function equals ag and the level equals y, + (Bi + Bye, + 
B.e,... This stochastic level can be called the trend at f; in terms of our earlier nota- 


188 Modeling Economic Time Series: Trends and Volatility sat bak Etec’ 
g 1e es: Trends an tatility Stochastic Trends and Univariate Decompositions L89 


ton, this trend is denoted by fl, This trend plus the deterministic value ags consti- The General ARIMA(p, 1, q) Model 
tutes the forecast £Y, There are several intersting points to note: i 
The first-difference of any ARIMA(p, I, q) series has the stationary infinite-order 
1. The trend is defined to be the conditional expectation of the limiting value of the moving average representation: 
forecast function. In lay terms, the trend is the “long-term” forecast. This fore- 
cast will differ at cach period ¢ as additional realizations of {€,} become avail- 
able. At any period ¢, the irregular component of the series is the difference be- 


i| tween y, and the trend p. Hence, the irregular component of the series is 


Yi Yi = Ay + & + BE) + B262 + 


As in the earlier example, it is useful to define ¢,=€, + Bye 


S g ad Baez + BoE l ý 
+, so that it is possible to write the solution for Yes 


in the same form as (3.51): 


| em ¥— Hy = (Bı + Be, as Bo.) (3.55) 


. > = ee 
At any point in time, the trend and irregular components are perfectly correlated Yas = J, + Uys + Se Doa ee “Megas es 
(the correlation coefficient being —1). a 

: 2. By definition, e, is the innovation in y, and the variance of the innovation is 07, 


pot Since the change in the trend resulting from a change in €, is 1 + B, + Ba, the 
H variance of the innovation in the trend can exceed the variance of y, itself. If- 
ete (1 + Bi + B3)? > 1, the trend is more volatile than y, since the negative correla- ; 


ak tion between the trend and irregular components acts to smooth the [y,} se- 
quence. l 


The next step is to express the {e,} sequence in terr 


e ms of the various values of the 
{e,} sequence. In this general case, (3.52) becomes : ; é = 


S S 


3 5 f s 
ee = vex +8, Yeti +By D erzi DORN Foe (3.56) 
i=l i=t is] f: 


i=] i=] 


3. The trend is a random walk plus drift. Denote the trend at f by p, so that p, = y, 
+ (Bi + Baje, + Bre,_,. Hence, 


Since Ee, = 0, it follows that the forecast function can be written as 


Au, = Ay, + (Bi + B2)Ae, + B246, 
= (9, = Ym) + (Bi + Bade, = Biei ~ B2€,-2 


st} s+2 


sS f 
Eres =y, tags + YB; € + YB, €i + YB, E,_2 Rey (3.57) 
i=l i=2 i=3 


Since y, — Yri = ao + €, + Bie) + BEd, 


AU, = dy + (i +B, + Boe, 


Now, to find the stochastic trend, take the limiting value of the forecast E Vis 7 


. GS) as $ becomes infinitely large. As such, the stochastic trend js?! 
Thus, H, = Hy + ao + (i + B, + Bae, so that the trend at ¢ is composed of the 


drift term ay plus the white-noise innovation (1 + B, + B,e,. 


Beveridge and Nelson show how to recover the trend and irregular components 
from the data. In the example at hand, estimate the {y,} series using Box—Jenkins 
7 techniques. After the data are differenced, an appropriately identified and estimated 
“ ARMA model will yield high-quality estimates of ay, B,, and B,. Next, obtain €, 

A and €,_, as the one-step ahead forecast errors of y, and y,_,, respectively. To obtain 
these values, use the estimated ARMA model to make in-sample forecasts of each 
observation of y,_, and y, The resulting forecast errors become e, and €y 
Combining the estimated values of B,, Ba, €, and €,_, as in (3.55) yields the irregular 
component. Repeating for each value of ¢ yields the entire irregular sequence. From - 
(3.55), this irregular component is y, less the trend; hence, the permanent compo- ` 
nent can be obtained directly. 


YF ba €, + 8; Ei t SB, E2 t e 
i=} i=2 i=3 


ERTES 
i 


The key to operationalizing the decomposition is to reco 


` gnize that can be 
wntten as . wae 


Yus = AY ras + AV rast + AY 45-2 asset AVia1 +y, 


æ" © As such, the trend can be always be written as the current value of Yı 
` sum of all the forecasted changes in the sequence. If we 
. chastic portion of the trend is 


plus the 
abstract from ays, the sto- 


wee) 


terse 


190 Modeling Economic Time Series: Trends and Volatility 


Lim Eras = lim E, LOs Vast )+ Siar -1 T Mas-2 Jakes 
sm 5—00 


+ (Yaz Yni) + ni =y) +y, 


= lim E (Ayas HAY pate A pa FAY), (83.58) ` 
$00 


The useful feature of (3.58) is that the Box—Jenkins method allows you to calcu- 
late each value of E,Ay,,,. For each observation in your data set, find all s-step 
ahead forecasts and construct the sum given by (3.58). Since the irregular compo- 
nent is y, minus the sum of the deterministic and stochastic trends, the irregular 
component can be constructed as 


YT M(E, Yas dys) = — lim E (Ayyy HAY t AY 42 FAY 4) Sg | 
Soo se 


Thus, to use the Beveridge and Nelson (1981) technique: 


STEP 1: Estimate the first difference of the serics using the Box—Jenkins technique. 
Select the best-fitting ARMA(p, q) model of the {Ay,} sequence. 


STEP 2: Using the best-fitting ARMA model. for cach time period t= 1,..., T, 
find the one-step ahead, two-step ahead, . . , s-step ahead forecasts: that is, 
find E,Ay,,, for each value of t and s. For cach value of £, use these fore- 
casted values to construct the sums: E,(Ay,,, + Ayp Hee + Ayia) + y, In 

- practice, it is necessary to find a reasonalle approximation to (3.58); in 
their own work, Beveridge and Nelson let + = 100. For example, for the 
first usable observation (i.c.. £ = 1), find the sum: 


Hi = Ey (Avior + AY + > + Ay.) + yy 


The value of y, plus the sum of these forecasted changes equals E Yin 


the stochastic portion of trend in period 1 is Æ,Yio; — ays and the determin- - 


istic portion ays. Similarly, for t = 2, construct ` 
H2 = Ey (AYio2 + AYiay +e + Aya) + y2 


If there are F observations in your data set, the trend component for the 
last period is 


Ly = Ey (AYpyioy + AY; 99 + = + AY) + Yr 


The entire sequence of constructed trends (i.e, Ly. Hy, ey H7) constitutes 
the {u,} sequence. 


` the data set yields the irregular and perm 


* The Beveridge and Nelson (1981) decomposition has proven especi 


vide an example in which the Beverid 


Stochastic Trends and Univariate Decompositions 191 


STEP 3: Form the irregular component at 1 by subtracting the stochastic portion of 
the trend at ¢ from the value of y, Thus, for each observ 


ation z, thie irregu- 
lar component is ~E, (AY; .109 + AY poy H= + Ay 


m1). 


Note that for many serjes, the value of s can be quite small. For cx 
ARIMA (0, 1, 2) model of (3.58), the valuc of s can be set equal to 2 since all fore- 
casts for s > 2 are equal to zero. If the ARMA mode} that is estimated in Step | has 
slowly decaying autoregressive components, the value of s should be large enough 
so that the s-step ahead forecasts converge to the deterministic change dp. 


ample, in the 


An Example 


` In Section 9, the natural log of the yen was estimated as the ARIMA(O, I, 1) 


process: 


Ay, = -0.0116 + (1 + 0.3686L)e, 


where Ay, = the logarithmic change in the yen/dollar exchange rate 


Step 2 requires that for each observation, w 
ahead: forecasts. For this model, the mech 
the one-step forecast is 


e form the one-step through s-step 
amics are trivial since for cach period 1 


, 


E,AY,1 =—0.01 16 + 0.3686e, 


and all other s-step ahead forecasts are -0.01 16. 
Thus, for each observation t, the summation ExAY 100 + AY gy tee + AY) is 
equal to —100(0.0116) + 0.3686¢,. For example, for 1973:Q2 (the first usable obser- 
vation in the sample), the stochastic portion of the trend is 1973.92 + 0.3686€ jy73.¢9 
and the temporary portion Of Y1973-92 is ~0.3686€ 1973:02: Repeating for each point in 
anent components of the sequence. Figure 
3.15 shows the temporary and the permanent portions of the series. As you can 
clearly see, the trend dominates the movements in the irregular component. Hence, 


z- nearly all changes in the yen are permanent changes: 


The estimated ARIMA(O, I, 1) model is the s 
Set equal to zero. As such, you should be 
(3.55) for the yen/dollar exchange rate. 


pecial case of (3.48), in which Ba is 
able to write the equivalent of (3.49) to 


An Alternative Decomposition 


ally useful in 
that it provides a straightforward method to decompose any ARIMA( p. 1.-q) process 


into a temporary and permanent component. However, it is important to note that the 
Beveridge and Nelson decomposition is not unique. Equations (3.54) and (3.55) pro- 
ge and Nelson decomposition forces the inno- 


vation in the trend and stationary components to be perfectly correlated. 


an aig” 


192 Modeling Economic Time Series: Trends and Volatility 


Figure 3.15 Decomposition of the yen/Jollar exchange rate. 
0.2 TT I HT oT m T f 


Temporary 


Permanent 


Natural logarithm 


-0.8 


1.2 auuu uu i 


hanoi Temporary component —— Permanent component 


In fact, this result applies to the more general ARIMA(p, I, 4) model. Obtaining 
the irregular as the difference between y, and its trend forces the correlation coeffi- 
cient between the innovations to equal --1. However, there is no reason to constrain 
the two innovations in the two components to be perfectly correlated. To illustrate 
the point using a specific example, consider the trend plus noise model of (3.41): 


Dia B, + N, A $ 0:33) 
U, = dot Hi +E, : (3.60) 


where Een, = 0 ao 
To derive the forecast function, update (3.43) by s perieds to obtain 


its 


Yes = Yo Ny ty (tt s+ Se thas 


i=l 


S 
Viss TY, Edos + > Sri tT Mas TT 


i=} 


' (3.62) to satisfy the restrictions of (3.61), it must be the case that 


Stochastic Trends and Univariate Decompositions 193 


The forecast function for all s > 0 is such that Ears = Yi + dos — Ny hence, the 
stochastic level is y, — n,. Thus, the stochastic trend at £ is Y=; =U, so that the ir- 


regular component is n,. The trend and irregular components are uncorrelated since 
EY, — 1), = Eum, = 0. Thus, the Beveridge and Nelson methodology would in- 
correctly identify the trend and irregular since it would force the two innovations to 
be perfectly correlated. 

Now. consider the correct way to identify the two components in (3.59) and 
(3.60). In Section 9, this trend plus noise model was shown to have 


an equivalent 
ARIMA(Q, 1, 1) representation such that 


Edy,=0; > var(Ay,) =07 +202, and cov(Ay,, Ay) = -0 Bon 


Hence, it is possible to represent (3.59) and (3.60) as the MA(1) process: 


Ay, = dy +e, + Bye 


m1 


where e, = an independent white-noise disturbance. 


The notation e, is designed to indicate that shocks to Ay, 
and 7,. The problem is to decompose. the estimated val 
source components. 


come from two sources: €, 
ues of {e,} into these two 


In this instance, it is possible to recover, or identify, 


the individual {e,} and {n,} 
shocks from the estim 


ation of (3.62). The appropriate use of the Box—Jenkins 
methodology will yield estimates of do, Bi and the elements of the {e,} sequence. If 
we use these estimates, it is possible to form 


Var(Ay,) = var(e, + Bie) = (1 + B,)? var(e,) 


E and 


Cov(Ay,, Ay,1) = B; var(e,) 


However, these estimates of the variance and covariance are not arbitrary; for 


(1 +BY var(e) =o? + 203 


; i Bi var(e,) = -03 


Now that we have estimated B, and var(e,), 
from the data. The-individual values of the { 
ered as well. From the forécast function, 


it is possible to recover o? and ož 
€} and {N,} sequences can be recov- 
Edin = ¥, + dy — Ny Hence, it is possible 


javon sree tela 


yri 


oon iibyiiya 


P 
a 


oon p 
ht 


Enae a ARAI SAA i 


rya 


} 


194 Modeling Economie Time Series: Trends and Vo tility 


to use one-step ahead forecasts from (3.62) to find E, Ayı = do + Bien so that 
E Nai =Y, + Qo + Bie, Since the two forecasts must be equivalent, it follows that 


Bie, =-n, 


Thus, the estimated values of B,e, can be used to identify the entire {1,} se- 
quence. Given {e,} and {1,}, the values of [€,} can be obtained from Ay, = ay + €, + 
An, For cach value of t, form €, = Ay, - ay — 5n, using the known values of Ay, and 
the estimated values of ay and An,. 

The point is that it is possible to decompose a series such that the correlation be- 
tween the trend and irregular components is zcro. The example illustrates an espe- 
cially important point. To decompose a series into a random walk plus drift and sta- 
tionary irregular component, it is necessary to specify the correlation coefficient 
between innovations in the trend and irregular components. We have seen two 
ways to decompose an ARIMA(O, 1, H) model. In terms of (3.59) and (3.60), the 

Beveridge and Nelson technique adds the restriction that 


Een/oo,, = \ 


so that the innovations are perfectly correlated, while the second decomposition 
adds the restriction: 


Een, =0 


In fact, the correlation coefficient between the two components can be any-num- 
ber in the interval —1 to +1. Without the extra restriction concerning the correlation 
between the innovations, the trend and stationary components cannot be identified; 
in a sense, we are an equation short. This result carries over to more complicated 
models since it is always necessary to “cleave” or “partition” the contemporaneous 
movement of a series into its two constituent parts. The problem is important be- 


cause economic theory docs not always provide the relationship between the two 


innovations. However, without a priori knowledge of the relationship between in- 


novations in the trend and stationary components, the decomposition of a series: ‘ 


into a random walk plus drift and a stationary component is not unique. £ 


What if e, and n, are uncorrelated, but you incorrectly use a Beveridge and: 


Nelson (1981) decomposition to obtain the temporary and permanent components 
Clearly, the in-sample forecasts are invariant to the form of the decomposition, 


Equation (3.58) has an ARIMA(0, I, 1) representation that you should properly. 
capture using Step | of the Beveridge and Nelson method. As such, there is no way : 
for you to determine that the assumption of perfectly correlated innovations is in- ; 


correct. The issue has nothing to do with the correct form of the ARIMA model 


rather, the problem is the way in which the innovations in the trend and irregular ` 


components are partitioned. | 


What will the researcher incorrectly partitioning the variances find? Usin 


Beveridge and Nelson decomposition for an ARIMA(O, 1, 1) model—see (3.48) ° 


Ree a 


Summary and Conclusions 


195 


and (3.54) with B2 =0—the researcher will set the irregul 


multiplied by the entire į ion i 
e innovation in y, (i.c., €, + in f: innovati 
uncorrelated, the actual value of the repul eee ee 


ed, the ar at fis n,. The dil is i 
: i vat ilemma i : 
co to identify the “true” mode] using sample data, eee 
atson i 
meine aie decomposes the logarithm of GNP under the two altern 
Oncerning the innovations in th i 
$ e trend and irre 
a a : 
nd Nelson decomposition, he estimates the ARIMA(1 
errors in parentheses): ' 


ar component equal to -f, 


ative as- 
gular. Using a Beveridge 


Ay, = 0.005 + 0.406Ay,_, + E 


o=0.0! 
. (0.001) (0.077) j 


Assuming that innov 


ations in the trend and i 
lated, Watson estimates yee 


oe Ayn ar Components are uncorre- 

Au, = 0.008 + (M o = 0.0057 

(0.001) 
A(L)n, = (1 = 1 SOIL + 0.57712 
t : y M, O, = 0.0076 
(0.121) - (0.125) 
The short- a 
ae AR aes of the two models are quite similar. The standard-err 
eee: X ca forecast of this second model is slightly smaller than iy 
. PA me = 0.0095 is slightly smaller than 0.0103. Howey ra 
e two models are quite different. F , 

, (0.005 + ev = 0.406L) yields the impulse response fu 


Yelse mposition. The sum of the coe 
- function is 1.68. Hence, a - a 


- a full 1.68 units. Since 


or example, writing Ay, = 


all coefficients are positi 
ADI ; posit 
Steadily increases to its new level. In contra 


: ARIMA(O. i 
(0, 1, 2) such that the sum of the impulse response coefficients is only 0.57 


All O icl nt $ ’ a 

2 £ with l 

C eff cients be In I } ag 4 are nega ve, AS such a One-unit innova on in 
Y; has a larger effect in the short run than the lo 1g run 


ally hetero- 
ariance of a series to 


pr > realization of the cur- 
ses the conditional variance in subsequent Pod 
a na onal variance will eventually decay to the’ long-run 
pre des eee erat and GARCH models can capture periods 
Of lity. The basi ) 
Tage aoe pene asic GARCH model has been extended by 


iditional variance.. 


for a stable process, the conditi 


allow for a unit root in the co: 


L 0) model (with standard 


196 Modeling Economic Time Series: Trends and Volatility 


Such an integrated GARCH (IGARCH) procss allows for shocks to have a perma- 
nent effect on the conditional variance. 

Conditional variance is a measure of risk. ARCH and GARCH effects have been 
included in a regression framework to test hypotheses involving risk-averse agents. 
For example, if producers are risk-averse, conditional price variability will affect 
product supply. Producers may reduce their exposure by withdrawing from the 
market in periods of substantial risk. Similarly, asset prices should be negatively re- 
lated to their conditional volatility. Such ARCH effects in the mean of a series 
(ARCH-M) are a natural implication of asset-pricing models. The basic GARCH 
model has been extended to allow the conditional variance to have a unit root. This 
integrated GARCH, or IGARCH, process is discussed in Engle and Bollerslev 
(1986). 

Nonstationarity due to a time-dependent mean and/or variance is another com- 
mon feature of economic time series. The trend in a series can contain both stochas- 


tic and deterministic components. Differencing can remove a stochastic trend and | 


“detrending” can eliminate a deterministic trend. However, it is inappropriate to 
difference a trend stationary series and detrend a series containing a stochastic - 
trend. The resultant irregular component of the series can be estimated using 
Box-Jenkins techniques. 

In contrast to traditional theory, the concensus view is that most macroeconomic 
time series contain a stochastic trend. Decomposing real GNP into its permanent 
and temporary components, as in Beveridge: and Nelson (1981), indicates that inno- 
vations in the stochastic trend account for :t sizable proportion of the period-to-pe- 
riod movements. However, the Beveridge and Nelson decomposition is not unique 
in that it forces the correlation coefficient between innovations in the trend“and ir- 
regular components to be unity. Some of the issues are considered in the appendix 


to this chapter. In a very technical paper, Quah (1992) takes the issue one step fur- 


ther: He proves that the random walk plus drift model is not a unique form for the 
trend. In Chapter 5, you will be shown a multivariate technique that allows for a 
unique decomposition of a series into its temporary and permanent components.: 


QUESTIONS AND EXERCISES 


1. Consider the ARCH-M model represented by Equations (3.30) to (3.32). 
Recall that {e,} is a white-noise disturbance; for simplicity, let Ee; = Ee, = 
ens 


A. Find the unconditioral mean Zy, How does a change in 6 affect the mean? 
Using the example « f Section 6, show that changing B and 6 from (—4, 4) to 
(-1, 1) preserves the mean of the | y,} sequence. 


B. Show that the unconditional variance of y, when A, = Op + ae? docs not ` 


a depend on B, 5, or %. 


Questions and Exercises 197 


2. Suppose that the { 


€,} sequence is generated by the ARCH rOCeSS - 
eae i aig y (q) process repre 


E, = V(O + AE + + Eig ja 


, Show that the conditional expectation £,_,¢? has the same form as the condi- 
tional expectation of (3.1). 


the GARCH(p, q) process represented by (3.9) acts as an ARMA(im, p) 
process, where m = max(p, q). You arc to illustrate this result using the exam- 
ples below. 


A. Consider the GARCH(1, 2) process A, = Op + OE? + O,€24 + Biin Add 
the expression (€?— h,) to each side, so that i g 


253 2 2 
E€ = Ay + HER; + AQE_2 + Bihi + (€7- h) 


= Oy + (Oy + BE? + 02€ — Bilek — Ai) + (e7— h) 


Define n, = (€?— h,), so that de 


7 iam a j sk x 
€7= Oy + (0, + Bez + weka BM aN L 


Show that: 


i. 1, is serially uncorrelated. 


ii. The {€?} sequence acts as an ARMA(2, 1) process. 


B. Consider the GARCH(2, 1) process h, = Oy + Q€ + By); + BA, -z Show 
that it is possible to add ņ, to each side so as to obtain 


eat 2 
E= Oy + en + By +, + Bolt, 


Show that adding and subtracting the terms B,n,_, and B.n,_2 to the right- 
hand side of this equation yield an ARCH(2, 2) process. 


C. Provide an intuitive explanation of the statement: “The Lagrange multiplier 


test for ARCH errors cannot be used to test the null of white-noise squared 
residuals against an alternative of a specific GARCH(), q) process.” 


. Sketch the proof of the general statement that the ACF of the squared resid- 
uals resulting from the GARCH(p, q) process represented by (3.9) acts as 
an ARMA(m, p} process, where m = max(p, q). ; 


. Given an initial condition for yọ, find and interpret the forecast function for 
each of the following models: 


= bes 


. Bollerslev (1986) proves that the ACF of the squared residuals resulting from . 


E 


T amai ot al 


tia 9% 


A hee pons 


ei he Sas Teer aa es 


we ne ay 


Ponape AGA Eee 


198 


5. 


6. The file labeled ARCH.WK1 contains the 100 realizations of the simulated 
{y,} sequence used to create the lower right-hand graph of Figure 3.9. Recall : 


Modeling Economic Time Series: Trends and Volatility 


A. Y= Ymi +E, +0.56, B. y,= lly, + €, 
C. y= ym tl +e, D. y,=y,, t/+E, 
E. y, =p, +N, + 0SN, where p= Ha + €, 


F. y=, +n, + 0.5, where y, =0.5 + Hi + & 


`G. How can you make the models of parts B and D stationary? 


H. Does model E have an ARIMA(p, 1, q) representation? -. 


Let yy = 0 and the first five realizations of the {e,} sequence be (1, -F -2,1, 
1). Plot each of the following sequences: 


Model 1: y, =0.5y 1 + €, 


Model 2: y, = €, - €2, 
Model 3: -y, = 0.5v; 1 + & E2, 


A. How does the ARCH-M specification affect the behavior of the {y,} se- i 


quence? What is the influence of the autoregressive term in model 3? 


B. For each of the three models, calculate the sample mean and variance of | 


{y,}- 


that this series was simulated as y, = 0.9y,_, + €, where e, is the ARCH(1) étror 
process e, = v1 + 0.8e,_,)'7. You should find that the series has a mean of 
0.263369480, a standard deviation of 4.89409139 with minimum and maxi- 
mum values of -10.8 and 15.15, respectively. 


A. Estimate the series using OLS and save the residuals. You should obtain 
= 0.9444053245y,_, + €, 


The r-statistic for a, is 26.50647. 


Note that the estimated value of a, differs from the theoretical Value of 0.9. : 
This is due to nothing more than sampling error; the simulated values of {v,} 
do not precisely conform to the theoretical distribution. However, can you pro- 
vide an intuitive explanation of why positive serial correlation in the {v,} se- 


quence might shift the estimate of a, upward in small samples? 


B. Plot the ACF and PACF of the residuals. Use Ljung-Box Q-statistics to de- 


termine whether the residuals approximate white noise. You should find 


ACF of the residuals: 


i: 01489160 0.0044162 -—0.0178424 -0.0124788 — 0.0682729 0.0028705 $ 
7: -0.0994202 -0.1508656 0.0643873 0.1012332 0.0898023 -0.037916 


Questions and Exercises 199 


PACF of the residuals: 
1: 0.1489160 -0.0181625 -0.0161712 -0.0074713  0.0727149 ~0.0192058 
7: -0.0996379 -0.1234779 —0.1315448 0.073 1477 0).0606913 -0.0565566 


Ljung-Box Q-statistics: O(4) = 2.3142, significance level 0.50980859 
QO(8)= 6.3861, significance level 0.49546069 
Q(24) = 18.4914, significance level 0.73031863 


C. Plot the ACF and PACF of the squared residuals. You should find 


ACF of the squared residuals: 


I: 0.4730473 -0.1268669 -0.0573466 —0.0777808 0.0570613  0).2424039 
7: 0.2727332 0.2140628 0.1368675 -0.0053388 -0.0660162 —0.0942429 


PACF of the squared residuals: 


L 0.4730473. -0.1248437 -0.0861060 0.0037908  0.1351502 00.1981716 
H 0.0702680 00.0620095 0.0682656 -0.0656655 -0.0381717 —0.1030398 


: Ljung-Box Q-statistics: Q(4) = 25.4702, significance level 0.00001231 


Q(8) = 45.2535, significance level 0.00000012 
Q(24) = 50.6029, significance level 0.00076745 


Based on the ACF and PACF of the residuals and squared residuals, what 


‘can you conclude about the presence of ARCH crrors? 


D. Estimate the squared residuals as: €7= Oy + œe. You should veri fy 


Standard 
Coefficient Estimate Error (Statistic Significance 
Oo 1.5501077352 0. 5484906416 2.82613 0.00573246 
Q, 0.4745095418 0. 3899397119 5.27586 ().00000082 


Show that the Lagrange multiplier ARCH(1) errors is TR? = 22.027771 with’ 


a significance level of 0.00000269. 


E. For comparison purposes, estimate the squared residuals as an ARCH(4) 
process. You should find 


Standard 
Coefficient Estimate Error t-Statistic ` Significance 
Oo 1.934317326 0.653781567 2.95866 0.00394756 
a, 0.520622481  0.105584787 4.93085 0.00000372 
Oy -0.079036621 0.118547940 ~0.66671 0.50666555 
Oy —0.089 127597 0.118593767 -0.75154 0.45429036 
Q4 0.004812599 0. 105446847 0.04564 0.96369827 


A VE BT ERNE BEG AMBIT I 


DAES ICTR. CARPORT REEL ALE EO 


EE ATT 


Hee 


200 


Modeling Economic Time Series: Trends and Volatility 


i. Why is this ARCH (4) model inappropriate? 


F. Simultancously estimate the {),} sequence and ARCH() eror process us- 
ing maximum likelihood estimation. You should find 


Standard 
Coefficient Estimate Error t-Statistic Significance 
a 0.8864631666 0.0270362312 32.78797  0.00000000 
Oy 1.1735726519 0.2703953538 4.34021 0.00001423 
` Oy 0.6663896955 0.00270802 


0.2221985284 2.99907 


. The file WPIWK1 contains the quarterly values of the U.S. Wholesale Price 


Index (WPI) from 1960:Q1! to 1992:Q2. Use the data to construct the logarith- 
mic change as 


Alwpi, = log(wpi,) — Jog(wpi,_,) 


You should find: 
Observa- Standard 
Series tions Mean Error Minimum Maximum 
wpi 130 65.09 31.366 30.50 116.2 


Alwpi 129 0.0101428  0.01452535 -0.02087032 0.06952606 


A. Use the entire sample period to estimate Equation (3.19). Perform diagnos- 
tic checks to determine whether the residuals appear to be white-noise. 


B. Plot the ACF and PACF of the squared residuals. 
C. Estimate the various GARCH models given by (3.21), (3.22), and (3.23). 


. Series Y on the file labeled ARCHM.PRN contains 100 observations of a simu- 


lated ARCH-M process. The properties of the sequence are 


Sample mean — 1.06988500000 Variance ` 0.267006 
Skewness 0.47442 Significance level (Sk=0) 0.05642422 


A. Plot the ACF and PACF of the {y,} sequence. You should find that the first 
12 values are l 


ACF: 
1: 0.0115085 0.0316424  0.2320040 —0.0643045 -0.1395873 -—0.3094448 


7: —0,0009952 -0.1573020 -0.2247642 0.186190} -0.0510400 0.0451368 ` 


Questions and Exercises 201 


` PACF 


1: 00.0115085 0.0315141 0.2315492 —0.0727560 -0.1616008 -0.3873124 
7:  0.0369360 -0.0659871 —0.0779783  0.1446746 -0.0821942 ~0.0051101 


` Ljung-Box Q-statistics: Q(4)= 6.2172, 
Q(12) = 31.5695, 
Q(24) = 49.8118, 


significance level 0.18350104 
significance level 0.00161 190 
significance level 0.00149611 


B. Estimate the (y,} sequence using the Box—Jenkins methodology. Try to im- 
prove on the model: 


yr = Ay + € + Rara F RaEro 


where 
Standard 
Coefficient Estimate Error {Statistic Significance 
a 1.071771081 0.048009924 22.32395 0.00000000 
, B 0.254214138 0.098929960 2.56964 0.01 170287 
Be ~0.262006589 0.099273537 —2.63924 0.00968214 


ac. Examine the ACF and PACF of the residuals from the MA[(3, 6)] model 


above. Why might someone conclude that the residuals appear to be white- 
noise? Now examine the ACF and PACF of the squared residuals. You 
should find 


ACF of the squared residuals: 


l:  0.4981203 0.2509847 0.2895971 0.1625192 00.0430988  0.1141240 
7:  0.0907499 00.0532747  0.1365066 0.0261814  0.1592152 0.2503240 


PACF of the squared residuals: 


l: 0.4981203 0.0038049 0.2170029 —0.0878890 -0.0413535 00.1013672 
7: —0.0172378 00.0348213. 0.0984692 -0.1475101  0.2890676  0.0322684 


Ljung-Box Q-statistics: Q(4) = 43.7460, 
Q(8) = 46.5766, 

Q(12) = 58.9113, 

Q(24) = 64.5293, 


” significance level 0.0000 
significance level 0.0000 
significance level 0.0000 
significance level 0.0000 


D. Estimate the {y,] sequence as the ARCH-M process: 


YF atah, + €, 
h, =O + €_, 


1 
y 
t 
i 


one 


ore 


sy tygesters: 


eae pT 


202 


Modeling Economic Time Series: Trends and Volatility 


You should find 


Standard 
Coefficient Estimate Error © t-Statistic Significance 
Ay 0.908 1809340 0.0646439764 14.04896  0.00000000 
a, 0.6252387171 0.3491817146 1.79058  0.07336030 
Qo 0.1079170551 0.0193136878 5.58759  0.00000002 
QO, 05973791022 0.2387112973 => 2.50252  0.01233137 


E. Check the ACF and PACF of the estimated {€,} sequence. Do they appear 


to be satisfactory? Experiment with several other simple formulations of 
the ARCH-M process. 


9. Consider the ARCH(2) process Ee?= Og + 014€2., + Eka 


10. 


11. 


A. Suppose that y, = ay + ayı + €, Find the conditional and unconditional 
variance of {y,} in terms of of the parameters a), Op, O), and Qz. 


B. Suppose that (y,} is an ARCH-M process such that the level of y, is posi- 


tively related to its own conditional variance. For simplicity, let y, =O + °° 


QEZ, + OE3 + €, Trace out the impulse response function of {y,} to an 
{e,} shock, You may assume that the system has been in long-run equilib- 
rium (€, z = €,_, = 0) but now e, = 1. Thus, the issue is to find the values of 
Yy Yar Ya, and yy given that €, =e, = = =0. 


C. Use your answer to part B to explain the following result. A student esti- 
mated {y,} as an MA(2) process and found the residuals to be white-noise. 
A second student estimated the same series as the ARCH-M process y, = 0o 


+ O,6_, + È + €, Why might both estimates appear reasonable? How 


would you decide which is the better model? 


D. In general, explain why an ARCH-M model might appear to be a moving 
average process. 

Given the initial condition y,, find the general solution and forecast function 

for the following variants of the trend plus irregular model; 

A. y, =U, + ¥, where u, = tt + €, v, = (1 +B, Ln, and Eem, = 0 


mort 


B. y, =H, +v, where u, = u, + €, and v, = (1 + B,L)n, and the correlation be- 
tween €, and y, equals unity. 


C. Find the ARIMA representation of each model. 


The columns in the file labeled EXRATES.WK1 contain exchange rate indices 
for the British pound, French franc, German mask, Italian lira, Canadian dollar, 


and Japanese yen over the 1973:Q1 to }990:Q4 period. The units are eunene) ; 


Questions and Exercises 203 


per U.S. dolar and the values have been converted into indices such that 
1971:Q1 = 1.00. 

For the yen and Canadian dollar (columns 5 and 6, respectively} you should 
find the following: 


A. Use the data for the yen/dollar exchange rate (i.c., the last colums) to repro- 
duce the results reported in the text. 


Observa- Standard 
Series tions Mean Error Minimum Maximum 
Yen 72 0.61561729167 0.15471136174 0.34800000000 —0.84316700000 
Canadian i 


dollar 72 1.16505638889 0.12397561475 0.957116300000 1.391881000000 


Form the logarithmic change of each of the two series. 


B. Decompose the yen/dollar exchange rate into its temporary and transitory 
components using the Beveridge and Nelson (1981) decomposition. You 
should be able to reproduce the results in the text. 


C. Detrend the logarithm of the Canadian dollar (denoted by y,) by estimating 


». the regression y, = dg + a,t + €, Save the residuals and form the: correlo- 


gram. You should find that the residuals do not appear to be stationary. For 
example, the ACF of the residuals is 
ACF of the residuals: 
1: 0.9381108 0.8516773 0.7656438 —-0.6707062  0.5656608 ().4646090 
7: 0.3665752 0.2619469 0.160296! 0.0668779  -0.0233500  —-0.0959095 


D. Estimate the logarithmic change in the Canadian dollar as an MA(1) model. 
You should find 


Ay, = €, + 0.630867 1509€, 


The standard error of B, is 0.0927381095, yielding a t-statistic of 6.80267. 


E. Perform the appropriate diagnostic checks of the model. Js it necessary to 
include a constant? What about the autocorrelation cocfficient of 
0.2249136 at lag 3? You should verify that the Ljung—Box Q-statistics are: 


Q(4)= 5.6965, 
Q(8)= 7.9077, 
Q(16) = 16.3652, 


significance level of 0.12734706 
significance level of 0.34080750 
significance level of 0.35820258 


F. To keep the issue as simple as possible, proceed with the Beveridge and 
Nelson decomposition using:the MA(!) model. For each period 1, form the 
various s-step ahead forecasts. Why is it sufficient to-sct s = 1? 


Ey sce Tet OR eB 


LIPS PEARL IS EE ITE TT 


“tty 


[S 2 
i 
5 


SER iini Eee ri 


+ 


X 


TAANE REN ARERR EE n 


m 


204 Modeling Economic Time Series: Trends and Volatility 


G. Form the trend and irregular components of the logarithm of the Canadian 
dollar. You should be able to verify 


Trend Temporary 
Period Log Canadian $ Component Component 
1973:02 -0.008634167376 -0.006909245096 —0.001724922281 
1973:03 -0.00467 1896332 -0.003260426492 -0.001411469840 
1973:04 -0.008734030846 -0.012187148030 0.003453117184 
1990:03 -0.134045635379 0.132362802801 0.001682832578 


1990:04 0.140821380972 0.146157620084 -0.005336239112 
H. How would you select s if you found the autoregressive coefficient at lag 3 
to be important? 


l. Detrend at least one of the other exchange rate series in the file (you may 
convert to logs). Does the detrended series appear to be stationary? 
Compare with the first diffenence of the serics. 


ENDNOTES 


1. Some authors prefer the spelling homoscedastic and hetcroscedastic, both forms are cor- 
rect. . , ia 

2. If the unconditional variance of a series is not constant, the series is nonstationary. 
However, conditional heteroskedasticy is not a source of nonstalionarity. 

3. Letting a(L) and B(L) be polynomials in the lag operator L, we can rewrite h, in the 
form: 


h, = + a(Le? + BCL), 


The notation oe denotes the polynomial a(L) evaluated at L = 1; that is, 
a(l) = &) + 0z + + Qy. Bollerslev (1986) shows that the GARCH process is 
stationary with Bee =0, varle) = = /[1 - a(1) — BCD], and cov(e,,€,_,) = 0 for 
s#Oifadhy+ BU) <1. 
4. Unfortunately, there is no available method to test the null of white- noise errors versus: 
` the specific alternative of GARCH(p, q) errors. Bollerslev (1986) proves that the ACF: 
of the squared residuals resulting from (3.9) is an ARMA(m, p) model, where m = 
max(p, q). Question 3 asks you to illustrate this result. i 
5. Constraining the coefficients of h, to follow a decaying pattern conserves degrees of 
freedom and considerably eases the estimation process. Moreover, the lagged coeffi-! 
cients given by (9-i)/36 (i.e., 8/36, 7/36, . . - , 1/36) are each positive and sum to unity. 
6. Estimating a model with n lags usually entails a loss of the first n observations, To cor! 
rect for this problem, the ARCH and GARCH models should be compared over the’ 


Endnotes 205 


identical sample period. In this way, the number of usable observations will be identical 
for the two models. In this section, all models were estimated over the 1962:2 to 1992:2 
sample period. One observation was lost duc to differencing and cight were lost due to 
the estimation of the ARCH(8) model. 

7. The estimated valuc of 7, is the conditional variance of the logarithmic change in the 
WPI; in constructing the figure, the interval for the percentage change was converted to 
the level of the WPL. 


8. In addition to the intercept term, three seasonal dummy variables were also included in 

the supply equation. 

9. If the underlying data-gencrating process is autoregressive, adaptive expectations and ra- 

tional expectations can be perfectly consistent with cach other. 

. If the utility function is quadratic and/or the excess returns from holding the asset are 
normally distributed, an increase in the variance of returns is equivalent to an increase in 
“risk.” 

. Of course, to the individual contemplating the purchase of a risky asset, the value of p, 
is not stochastic. Note that p, is the expected return that the individual would demand in 
order to hold the long-term asset. 

12. The unconditional mean of y, is altered by changing only 5, Changing B and 5 commen- 

surately maintains the mean value of the {y,} sequence. 

13. The Greek character set and subscripts decending below the line are not permitted in 

RATS. To actually write such a program, the parameters fi, Co, Œy, and h, might be de- 
noted by B, AO, Al, and H(T), respectively. 

. The method is recursive since the program first calculates €, then A, and then LIKELI- 
HOOD. 

15. In actuality, the program steps in RATS would differ slightly since € could not be de- 

fined in tenns of its own lagged values. Similar remarks hold for all the program state- 

ments below. _ 

16. Many treatments use the representation y, = trend + cyclical + seasonal + irregular, In 

the text, any cyclical components are included with the irregular term; the notion is that 
cyclical economic components are not deterministic, 

. A linguist might want to know why “detrending” entails removing the deterministic 

_ wend and not the stochastic trend. The reason is purely historical; originally, trends were 
viewed as deterministic. Today, subtracting the deterministic time trend is still called 
“detrending.” 

_ 18. If B(L) is of infinite order, it is assumed that EP? is finite. 

> 19. If only B(L) has a unit root, the process is not invertible. The {y,} sequence is stationary 

_ (may be stationary), but the usual estimation techniques are inappropriate. I both A(L) 
and B(L) have unit roots, the common factor problem discussed in Chapter 2 exists, The 
unit root can be factored from A(L) and B(L). 


<4 20. Also assume that all values of €, are zero fori < 1. 


$: 21.-As an exercise, prove that the first difference of the trend acts as a random walk plus 
drift. Show that p, — H, has the intercept ag plus a serially uncorrelated error, 

22. ‘The assumption that €, anc 1, are uncorrelated places restrictions on the autoregressive 
‘and moving average coefticients of Ay, For example, in the pure random walk ‘plus 

‘noise model, B, must be negative. To avoid estimating a constrained ARIMA. modcl, 

` Watson estimates the trend and irregular terms as unobserved components. Many soft- 

“ware packages are capable of estimating such equations as time-varying parameter mod- 

“els, Details of the procedure can be obtained in Harvey (1989). ` 


SBIR IORI AS Sn aoa 


marase cane 


ede EEN 


20% Modeling Economic Time Series: Trends and Volatility Appendix: Signal Extraction and Minimum Mean Square Errors 207 


APPENDIX: Signal Extraction and Mitimum Maan _ Since, Hp 6, and 62 are constants and p, = Ex, it follows that 
Square Errors Bests 
Linear Least-Squares Projection 
The problem for the econometric forecaster is to select an optimal forecast of a ran- 
dom variable y conditional on the observation of a second variable x. Since the the- 
ory is quite general, for the time being we ignore time subscripts. Call this condi- 
tional forecast y*, so that the forecast error is (y — y*) and the mean square forecast 
error (MSE) Ely — y*)?. One criterion used to compare forecast functions is the 
MSE; the optimal forecast function is that with the smallest MSE. a 
Suppose x and y are jointly distributed random variables with known distribu- 
tions. Let the mean and variance of x be p, and 6%, respectively. Also suppose the 
value of x is observed before having to predict y. A linear forecast will be such that 
the forecast y* is a linear function of x. The optimal forecast will necessarily be lin- 
ear if x and y are linearly related, and/or if they are bivariate normally distributed 
variables. In this text, only linear relationships are considered; hence, the optimal 
forecast of y* has the form 


You should recognize this formula from standard regression analysis; a regres- 
sion equation is the minimum mean square error, linear, unbiased forecast of y*. 
The argument easily generalizes forecasting y conditional on the observation of the 
n variables x, through x, and forecasting y,,, conditional on the observation of yp 
Jı- .- For example, if y, = ay + ayy,, + €, the conditional forecast of y,,, is 
Eis, = ao + ayy, The forecasts of y,,, can be obtained using the forecast function < 
(or iterative forecasts) discussed in Section 11 of Chapter 2. 


Signal Extraction 


Signal extraction issues arise when we try to decompose a series into its individual 

components. Suppose we observe the realizations of a stationary sequence {y,} and 
© want to find the optimal predictor of its components. If we phrase the problem this 
` way, it is clear that the decomposition can be performed using the minimum MSE 
criterion discussed above. As an example of the technique, consider a sequence 
À ` composed of two independent white-noise components: 

` y*=a+ b(x~-p,) l 


w Ge P y=e tN, 
The problem is to select the values of a and b so as to minimize the MSE: 


where Ee, = 0 

Min EQ - y*)? = Ely -a—b(x- BP? pei 2 =O 
lab) = e 
= Ely? +a? +b? (x -— p,)? — ay + 2ab(x — u,) — 2by(x- Lt] ae ae 
Ey, = OF 


Since E(x — p) = 0, Ey = up, E - LY? = o, and E(xy) — ph, = cov(x, y) =O. 


Here, the correlation between the innovations is assumed to be equal to zero; it is 
it follows that 


straightforward to allow nonzero values of Eem, The problem is to find the optimal 


Prediction, or forecast, of e, (called e*) conditioned of the observation of y,. The lin- 


EQ-y*)? = Ey +a + bot - 2ap, ~ 2b0,, car forecast has the form 


Z * 
Minimizing with respect to a and b yields e =a + by, 


3 In this problem, the intercept term a will be zero, so that the MSE can be written | 
a=, b=6,,/6; 


` Thus, the optimal prediction formula is MSE = Ele, — €*)? 


= Ee, — by)? 
= Efe, as be, + n)? 


y* =p = (OH + (G,,/02)x 


The forecast is unbiased in the sense that the mean value of the forecast is equal 


Hence the optimization problem is to select b so as to minimize: 
to the mean value of y. Take the expected valuc of y* to obtain 
MSE = E[(1 — b)e, - bn, 


Ey* = El, -= (6,,/02)p, + (6,,/02)x) =(1- bP Ee? + b EN? since Eem, = 0 


208 Modeling Economic Time Series: Trends and Volatility 


Appendix: Signal Extraction and Minimum Mean Square Errors 


209 
The first-order condition is 


Since all cross-products are zero, the problem is to select the vj 


SO as to minimize 
-2(1 - bio? + 2b03 =0 


i 


TDN 2. 2 Z 
MSE oisoio S iy loot Sy 
j=l 


i=] jsi 


so that 


b = 0°/(0° + 2) 


For each value of v,, the first-order condition is 
Here, b partitions y, in accordance with the relative variance of e; that is, o'o? 


+ 07). As o? becomes very large relative to Oñ, b — 1; as o? becomes very small 
relative to oñ, b — 0. Having extracted e, we see that the predicted value of n, is 


nr = y, — ež. However, this optimal value of b depends on the assumption that the 
two innovations are uncorrelated. 


3 (A3.3) 
Jk i=l , 


Forecasts of a Nonstationary Series Based on Observables q 
Muth (1960) considers the situation in which a researcher wants to find the optimal. 
forecast of y, conditional on the observed values of Y-i» Yz - - - Let {y,} be a ran- 


dom walk plus noise. If all realizations of {€,} are zero for t < 0, the solution for y; 
is i 


All {v,} will satisfy the difference 


equation given b : haricien l 
nature of the solution, set k 8 Y (A3.3). To characterize the E 


= I, so that the first equation of (A3.3) is 


and for k= 2, 


where yy is given and [yp = 0. 
3 > 


Let the forecast of y, be a linear function of the past values of the series, so that 
39 that by subtraction, 
oe wz 


: l 
Oe (=v) + OF O= v)=0 {A3.4) 


; a a Now take the second difference of (A3.3) to obtain 
where the various values of v, are selected so as to minimize the mean square 


forecast error, T Yan + [2 + (62/02), - v =0 


, fork=2,3,... 
Use (A3.1) to find each value of y,_, and substitute into (A3.2) so that 


£ The solution to this homogeneous second-order 


difference equation h 

w f , quation has the form 
"e Aya + A2X, where A, and A, are arbitrary constants and A, and A, the charac- 
a stic roots. If you use the quadratic formula, you will find that the larger root EO 
ea, (say, A.) is greater than unity; hence, if the {v,} sequence is to be convergent, A ; 
# Must equal zero. The smaller root satisfies i 


t- 1-2 1-3 


yy, etn, a] ett Fy Setns ae 


i=l t=} 


i=l 


Thus, the optimization problem is to select the v, so as to minimize the MSE: 
i p ' 4 Ai - [2 + (02/02) 4, +1-=0 


(A3.5) 
2 


f t=} 1-2 


Ey, -y =E Detni Yata VD etn 


i=l i=l : isl 


"To find the value of A,, substitute ¥ = A,A, and v, = 4,3%? into (A3.4): 


o2(1 ~A,A\) -RAA -2,) =0 


LAVAS 
If you solve (A3.5) for A,, it is possible to verify 


A, =(1-AVA, 
_, Hence, the-v,.are determined by 
Lo Be uina $, ad -apat! 


The one-step ahead forecast of y, is 
* j=] Lh, 
yP (=A), Yjp 
j=! a 


Since 1a, | < 1, the summation is such that (1 — 2)! = 1. Hence, the optimal 
forecast of y, can be formed as a geometrically weighted average of the past realiza- 
tions of the series. 


The Hodrick-Prescott Decomposition 

Another method of decomposing a series into a trend and stationary component has 
been developed by Hodrick and Prescott (1984). Suppose you observe the values y, 
through yy and want to decompose the series into a trend {p,} and stationary com- 
ponent y, — L,. Consider the sum of squares po 


T T-I eoe 
QT) Sy =)? + UT) Y (ya H) H =H) 


t=] 1=2 


The problem is to select the {u,} sequence so as to minimize this sum of squares. 
In the minimization problem, A is an arbitrary constant reflecting the “cost” or 
penalty of incorporating fluctuations into the trend. In many applications, including 
Hodrick and Prescott (1984) and Farmer (1993), À is set equal to 1600. Increasing 
the value of à acts to “smooth out” the trend. If à = 0, the sum of squares is mini- 
mized when y, = p; the trend is equal to y, itself. As A — oo, the trend approaches a 
linear time trend. Intuitively, for large values of A, Hodrick-Prescott decomposition 
forces the change in the trend (i-e., Aut,,, — AL,) to be as small as possible. This oc- 
curs when the trend is linear. 

The benefit of the Hodrick-Prescott decomposition is that it can extract the same 
trend from a set of variables. For example, many real business cycle models indi- 
cate that all variables will have the same stochastic trend. A Beveridge and Nelson 
decomposition separately applied to each variable will not yield the same trend for 
each. 


Chapter 4 


TESTING FOR TRENDS AND 
UNIT ROOTS 


ee caer i i 


laspection of the autocorrelation function serves as a rough indicator of whether a 
trend is present in a series. A slowly decaying ACF is indicative of a large charac- 
teristic root, true unit root process, or trend stationary process. Formal tests can 
help determine whether or not a system contains a trend and whether that trend is 
deterministic or stochastic. However, the existing tests have little power to distin- 
guish between near unit root and unit root processes. The aims of this chapter are 
to: 


I. Develop and illustrate the Dickey—Fuller and augmented Dickey—Fuller tests for 
the presence of a unit root. These tests can also be used to help detect the pres- 
ence of a deterministic trend. Phillips—Perron tests, which entail less stringent 
restrictions on the error process, are illustrated. 


` 2. Consider tests for unit roots in the presence of structural change. Structural 


change can complicate the tests for trends; a policy regime change can result in 
a structural break that makes an otherwise stationary series appear to be nonsta- 
tionary. 

3. Illustrate a general procedure to determine whether or not a series contains a unit 
root. Unit root tests are sensitive to the presence of deterministic regressors, 
such as an intercept term or a deterministic time trend. As such, there is a so- 
phisticated set of procedures that can aid in the identification process. These 
procedures can be used if it is not known what deterministic elements are part of 
the true data-generating process. It is important to be wary of the results from 
such tests since (1) they all have low power to discriminate between a unit root 
and near unit root process and (2) they may have used an inappropriate set of de- 
terministic regressors. 


ge edt A peal, Masts athe doe 


Pa 1. UNIT ROOT PROCESSES 


- 


As shown in the last chapter, there are important differences between stationary and 
nonstationary time series. Shocks to a stationary time series are necessarily tempo- 
rary; over time, the effects of the shocks will dissipate and the series will revert to 
its long-run mean level. As such, long-term forecasts of a stationary series will con- 
verge to the unconditional mean of the series. To aid in identification, we know that 
a covariance stationary series: 


1. Exhibits mean reversion in that it fluctuates around a constant long-run mean. 
2. Has a finite variance that is time-invariant. 


3. Has a theoretical correlogram that diminishes as lag length increases. 


On the other hand, a nonstationary series necessarily has permanent components. 
The mean and/or variance of a nonstationary series are time-dependent. To aid in 
the identification of a nonstationary series, we know that: 


1. There is no long-run mean to which the series returns. 
2. The variance is time-dependent and goes to infinity as time approaches infinity. 


3. Theoretical autocorrelations do not decay but, in finite samples, the sample cor- 
relogram dies out slowly. 


Although the properties of a sample correlogram are useful tools for detecting 
the possible presence of unit roots, the method is necessarily imprecise. What may 
appear as a unit root to one observer may appear as a stationary process to another, 
The problem is difficult because a near unit root process will have the same shaped 
ACF as a unit root process. For example, the correlogram of a stationary AR(1) 
process such that p(1) = 0.99 will exhibit the type of gradual decay indicative of a 
nonstationary process. To illustrate some of the issues involved, suppose that we 
know a series is generated from the following first-order process:! 


Yr = AY + & (4.1) 


where (€,} is generated from a white-noise process. 

First, suppose that we wish to test the null hypothesis that a, = 0. Under the 
maintained null hypothesis of a, = 0, we can estimate (4.1) using OLS. The fact 
that €, is a white-noise process and la, | < 1 guarantees that the {y,} sequence is 
stationary and the estimate of a, is efficient. Calculating the standard error of the 
estimate of a,, the researcher can use a r-test to determine whether a, is signifi- 
cantly different from zero. 

The situation is quite different if we want to test the hypothesis a, = 1. Now, un- 
der the null hypothesis, the {y,} sequence is generated by the nonstationary process: 


y= Ye 42) 


Thus, if a, = 1, the variance becomes infinitely large as ¢ increases. Under the 
null hypothesis, it is inappropriate to use classical statistical methods to estimate 
and perform significance tests on the coefficient a,. If the (y,} sequence is gener- 
ated as in (4.2), it is simple to show that the OLS estimate of (4.1) will yield a bi- 
ased estimate of a,. In Section 8 of the previous chapter, it was shown that the first- 
order autocorrelation coefficient in a random walk model is 


pi ={(- lyr}? <1 


Since the estimate of a, is directly related to the value of p,, the estimated value 
of a, is biased to be below its true value of unity. The estimated model will mimic 
that of a stationary AR(1)} process with a near unit root. Hence, the usual t-test can- 
not be used to test the hypothesis a, = 1. 

Figure 4.1 shows the sample correlogram for a simulated random walk process. 
One hundred normally distributed random deviates were obtained so as to mimic 
the {¢,} sequence. Assuming yo = 0, we can calculate the next 100 values in the (y,} 
sequence as y, = yı + €, This particular correlogram is characteristic of most sam- 
ple correlograms constructed from nonstationary data. The estimated value of p, is 
close to unity and the sample autocorrelations die out slowly. If we did not know 
the way in which the data were generated, inspection of Figure 4.1 might lead us to 
falsely conclude that the data were generated from a stationary process, With this 
particular data, estimates of an AR(1) model with and without an intercept yield 
(standard errors are in parentheses): 


yy, = 0.9546y,_, + €n R? = 0.86 (4.3) 
(0.030) 
cs y= 0.164 + 0.9247y,_,+€,  R? = 0.864 (4.4) 
(0.037) i 


Examining (4.3), a careful researcher would not be willing to dismiss the possi- 
bility of a unit root since the estimated value of a, is only 1.5133 standard devia- 
tons from unity. We might correctly recognize that under the null hypothesis of a 
unit root, the estimate of a, will be biased below unity. If we knew the true distrib- 
ution of a, under the null of a unit root, we could perform such a significance test. 
Of course, if we did not know the true data-generating process, we might estimate 


the model with an intercept. In (4.4), the estimate of a, is more than two standard . 


deviations from unity: (1 — 0.9247)/0.037 = 2.035. However, it would be wrong to 
use this information to reject the null of a unit root. After all, the point of this sec- 


tion has been to indicate that such t-tests are inappropriate under the null of a unit’ 


root. 

Fortunately, Dickey and Fuller (1979, 1981) devised a procedure to formally test 
for the presence of a unit root. Their methodology is similar to that used in con- 
structing the data reported in Figure 4.1. Suppose that we generated thousands of 
such random walk sequences and for each we calculated the estimated value of a,. 
Although most all of the estimates would be close to unity, some would be further 


aain ihaneti 


e e ai 


Figure 4.1 
A simulated random walk process. 
10 TTT TaT 


-4 UEUN LLIURE LEELEE LLELE EEE 
0 10 20 30 40 50 60 70 80 90 100 


{a} 


Correlogram of the process. 


0.8 }~ FS 
0.6 H 


0.4 7- 


(b) 


from unity than others. In performing this experiment, Dickey and Fuller found that 
in the presence of an intercept: 


Ninety percent of the estimated values of a, are less than 2.58 standard errors from 
unity. 


Ninety-five percent of the estimated values of a, are less than 2.89 standard errors 
from unity. 


Ninety-nine percent of the estimated values of a, are less than 3.51 standard errors 
from unity.? 


The application of these Dickey—Fuller critical values to tests for unit roots is 


- straightforward. Suppose we did not know the true-data generating process and 


were trying to ascertain whether the data used in Figure 4.1 contained a unit root. 
Using these Dickey—Fuller statistics, we would not reject the null of a unit root in 
(4.4). The estimated value of a, is only 2.035 standard deviations from unity. In 
fact, if the true value of a, does equal unity, we should find the estimated value to 
be within 2.58 standard deviations from unity 90% of the time. 

` Be aware that stationarity necessitates —1 < a, < 1. Thus, if the estimated value 
of a, is close to -1, you should also be concerned about nonstationarity. If we de- 
fine y = a, — 1, the equivalent restriction is ~2 < y < 0. In conducting a Dickey- 
Fuller test, it is possible to check that the estimated value of yis greater than -2.7 


Monte Carlo Simulation 


The procedure Dickey and Fuller (1979, 1981) used to obtain their critical values is 
typical of that found in the modern time series literature. Hypothesis tests concer- 
ing the coefficients of non-stationary variables cannot be conducted using tradi- 
tional t-tests or F-tests. The distributions of the appropriate test statistics are non- 
standard and cannot be analytically evaluated. However, given the trivial cost of 
computer time, the non-standard distributions can easily be derived using a Monte 
Carlo simulation. 

The first step in the procedure is to computer generate a set of random numbers 
(sometimes called pseudo-random numbers) from a given distribution. Of course, 
the numbers cannot be entirely random since all computer algorithms rely on a de- 
terministic number generating mechanism. However, the numbers are drawn so as 
to mimic a random process having some specified distribution. Usually, the num- 
bers are designed to be normally distributed and serially uncorrelated. The idea is to 
use these numbers to represent one replication of the entire {€,} sequence. 

All major statistical packages have a built-in random number generator. An inter- 
esting experiment is to use your software package to draw a set of 100 random 
numbers and check for serial correlation. In almost all circumstances, they will be 
highly correlated. In your own work, if you need to use serially uncorrelated num- 
bers, you can model the computer generated numbers using the Box Jenkins 
methodology. The residuals should approximate white noise. 

The second step is to specify the parameters and initial conditions of the {y,} se- 
quence. Using these parameters, initial conditions, and random numbers, the {y,} 
can be constructed. Note that the simulated ARCH processes in Figure 3.9 and ran- 
dom-walk process in Figure 4.1 were constructed in precisely this fashion. 
Similarly, Dickey and Fuller (1979, 1981) obtained 100 values for {€,}, set a, = 1, 
Yo = 0, and calculated 100 values for {y,} according to (4.1). At this point, the para- 
meters of interest (such as the estimate of a, or the in-sample variance of y,) can be 
obtained. 

The beauty of the method is that all important attributes of the constructed {y,} 
sequence are known to the researcher. For this reason, a Monte Carlo simulation is 
often referred to as an “experiment.” The only problem is that the set of random 


numbers drawn is just one possible outcome. Obviously, the estimates in (4.3) and 
(4.4) are dependent on the values of the simulated {€,} sequence. Different out- 
comes for {€,} will yield different values of the simulated {y,} sequence. 

This is why the Monte Carlo studies perform many replications of the process 
outlined above. The third step is to replicate steps 1 and 2 thousands of times. The 
goal is to ensure that the statistical properties of the constructed (y,} sequence are 
in accord with the true distribution. Thus, for each replication, the parameters of in- 
terest are tabulated and critical values (or confidence intervals) obtained. As such, 
the properties of your data set can be compared to the properties of the simulated 
data so that hypothesis tests can be performed. This is the justification for using the 
Dickey—Fuller critical values to test the hypothesis a, = 1. 

One limitation of a Monte Carlo experiment is that it is specific to the assump- 
tions used to generate the simulated data. If you change the sample size, include (or 
delete) an additional parameter in the data generating process, or use alternative ini- 
tial conditions an entirely new simulation needs to be performed. Nevertheless, you 
should be able to envision many applications of Monte Carlo simulations. As dis- 
cussed in Hendry, Neale, and Ericsson (1990), they are particularly useful for 
studying the small sample properties of time-series data. As you will see shortly, 
Monte Carlo simulations are the workhorse of unit root tests. 


Unit Roots in a Regression Model 


The unit root issue arises quite naturally in the context of the standard regression 
model. Consider the regression equation:* 


Y, = Ay + AZ, te, : (4.5) 


The assumptions of the classical regression model necessitate that both the {y,} 
and {z,} sequences be stationary and the errors have a zero mean and finite vari- 
ance. In the presence of nonstationary variables, there might be what Granger and 
Newbold (1974) call a spurious regression. A spurious regression has a high R?, 
t-statistics that appear to be significant, but the results are without any economic 
meaning. The regression output “looks good” because the least-squares estimates 
are not consistent and the customary tests of statistical inference do not hold. 
Granger and Newbold (1974) provide a detailed examination of the consequences 
of violating the stationarity assumption by generating two sequences, {y,} and {z,}, 
as independent random walks using the formulas: 


Yr = Ymi + Eyr i i (4.6) 


and 
Zi = Z1 + Ez : Sn ages (4.7) 


where Ey, and €,,= white-noise processes independent of each other 


In their Monte Carlo analysis, Granger and Newbold generated many such sam- 
ples and for each sample estimated a regression in the form of (4.5). Since the {y,} 
and {z,} sequences are independent of each other, Equation (4.5) is necessarily 
meaningless; any relationship between the two variables is spurious. Surprisingly, 
at the 5% significance level, they were able to reject the null hypothesis a, = 0 in 
approximately 75% of the time. Moreover, the regressions usually had very high R? 
values and the estimated residuals exhibited a high degree of autocorrelation. 

To explain the Granger and Newbold findings, note that the regression equation 
(4.5) is necessarily meaningless if the residual series {e,} is nonstationary. 
Obviously, if the {e,} sequence has a stochastic trend, any error in period t never 
decays, so that the deviation from the model is p permanent. It is hard to imagine at- 
taching any importance to an economic model having permanent errors. The sim- 
plest way to examine the properties of the {e,} sequence is to abstract from the in- 
tercept term a, and rewrite (4.5) as 


€= Vp Ta, 


If z, and y, are generated by. (4.6) and (4.7), we can impose the initial conditions 
Yo = Xo = 0, so that 


t p t nt 
e, = Sey -09 éz (4.8) 


Clearly, the variance of the error becomes infinitely large as t increases. More- 
over, the error has a permanent component in that Ee,,, = e, for all i 2 0. Hence, the 
assumptions embedded in the usua! hypothesis tests are violated, so that any t-test, 
F-test, or R? values are unreliable. It is easy to see why the estimated residuals from 
a spurious regression will exhibit a high degree ¢ of autocorrelation. Updating (4.8), 


ficient between e, and e, goes to unity as f increases. 

The essence of the problem is that if a, = 0, the data generating process in (4.5) 
iS y, = dy + €,. Given that {y,} is integrated of order one [i.e., I(1)], it follows that 
{e,} is I(1) under the null hypothesis. However, the assumption that the error term 
is a unit root process is inconsistent with the distributional theory underlying the 
use of OLS. This problem will not disappear in large samples. In fact, Phillips 
(1986) proves that the larger the sample, the more likely you are to falsely conclude 
that a, #0. 

Worksheet 4.1 illustrates the problem of spurious regressions. The top two 
graphs show 100 realizations of the {y,} and {z,} sequences generated according to 
(4.6) and (4.7). Although {¢,,} and {€,,} are drawn from white-noise distributions, 
the realizations of the two sequences are such that yioo is positive and Zio) negative. 
You can see that the regression of y, on z, captures the within-sample tendency of 
the sequences to move in opposite directions. The straight line shown in the scatter 


ya 
7 


plot is the OLS regression line y, = —0.31 — 0.46z,. The correlation coefficient be- 


tween {y,} and {z,} is -0.372. The residuals from this regression have a unit root; 
as such, the coefficients —0.31 and —0.46 are spurious. Worksheet 4.2 illustrates the 
same problem using two simulated random walk plus drift sequences: y, = 0.2 + y,_, 
+e and z,=—0.1 +z, +€, The drift terms dominate, so that for small values of t, 
it appears that y, = —2z,. As sample size increases, however, the cumulated sum of 


WORKSHEET 4.1 Spurious Regressions: Example 1° °°" tie 


Consider the two random walk processes: 


Yi FY, -1 + Ey t= Fp t Eu 
2 2 
0} 0 
2 4 
Ore -2 H 
~4 -4 
0 50 è 100 0 50 100 


Since the {€,,} and {€,,} sequences are independent, the regression of y, on z, is spu- 
rious. Given the realizations of the random disturbances, it appears as if the two se- 
quences are related. In the scatter plot of y, against z,, you can see that y, tends to 
rise as z, decreases. The regression equation of y, on z, will capture this tendency. 
The correlation coefficient between y, and z, is -0.372 and a linear regression yields 
y, = -0.46z, — 0.31. However, the residuals from the regression equation are nonsta- 
tionary. 


Scatter plot of y,and z, Regression residuals 


2 peaks ok a ced 2 
t+ 
+ 
R z t re + + 
+ 
oL z + +) E al] ok 
zt g Ei 
+ th 
zy 4 = 
hry 
4 
-4 (EEA N Se EERE -4 
-3 -2 -1 0 1 2 0 50 100 
+ Scatter plot i 


— Regression line 


the errors (i.e., Le,) will pull the relationship further and further from ~2.0. The 
scatter plot of the two sequences suggests that the R? statistic will be close to unity; 
in fact, R? is almost 0.97. However, as you can see in the last graph of Worksheet 
4.2, the residuals from the regression equation are nonstationary. All departures 
from this relationship are necessarily permanent. 

The point is that the econometrician has to be very careful in working with non- 
stationary variables. In terms of (4.5), there are four cases to consider: 


CASE 1 


Both {y,} and {z,} are stationary. When both variables are stationary, the classical 
regression model is appropriate. Bee 


CASE 2 


The {y,} and {z,} sequences are integrated of different orders. Regression equations 
using such variables are meaningless. For example, replace (4.7) by the stationary 
process z, = Ppz,_, + €,,, where | p| <1. Now (4.8) is replaced by e, = Ee, — Zp'e,,_;. 
Although the expression Lp’e,,_; is convergent, the {€,,} sequence still contains a 
trend component.° 


CASE 3 


The nonstationary {y,} and {z,} sequences are integrated of the same order and the 
residual sequence contains a stochastic trend. This is the case in which the regres- 
sion is spurious. The results from such 1 such spurious re regressions are meaningless in that 


all errors are permanent. In this case, it is often recommended that the regression 
‘equation be estimated in first differences. Consider the first difference of (4.5): 


Ay, = a, Az, + Ae, 
Since y,, z,, and e, each contain unit roots, the first difference of each is stationary. 


Hence, the usual asymptotic results apply. Of course, if one of the trends is deter- 
ministic and the other is stochastic, first-differencing each is not appropriate. 


: CASE 4 


The nonstationary (y,} and {z,} sequences are integrated of the same order and the 
residual sequence is stationary. In this circumstance, {y,} and {z,} are cointe- 
grated. A trivial example of a cointegrated system occurs if €, and €,, are perfectly 
correlated. If €,, = €,,, then (4.8) can be set equal to zero (which is stationary) by 
setting a, = 1. To consider a more interesting example, suppose that both z, and y, 
are the random walk plus noise processes: 


y= Ht E; 
Z = u, + Ez 
where €,, and €,, are white-noise processes and p, is the random walk process u, = 
Hı + €, Note that both {z,} and {y,} are unit root processes, but y, — Z, = €, — Ex İS 
stationary. 
All of Chapter 6 is devoted to the issue of cointegrated variables. For now, it is 
sufficient to note that pretesting the variables in a regression for nonstationarity is 
extremely important. Estimating a regression in the form of (4.5) is meaningless if 


WORKSHEET 4.2 Spurious Regressions: Example 2 


Consider the two random walk plus drift processes: 


y, = 0.2 + y,.4+ Ey, 
30 l 


Again, the {€,,} and {€„} sequences are independent, so that the regression of y, on 
z, is spurious, The scatter plot of y, against z, strongly suggests that the two series 
are related. It is the deterministic time trend that causes the sustained increase in y, 
and sustained decrease in z, The residuals from the regression equation y, = —2z, + 
e, are nonstationary. 


Scatter plot of y, and z, Regression residuals 


0 50 100 


Cases 2 or 3 apply. If the variables are cointegrated, the results of Chapter 6 apply. 
The remainder of this chapter considers the formal test procedures for the presence 
of unit roots and/or deterministic time trends. 


2. DICKEY-FULLER TESTS 


The last section outlined a simple procedure to determine whether a, = 1 in the 
model y, = a,y,, + €. Begin by subtracting y, from each side of the equation in 
order to write the equivalent form: Ay, = Yy,., + €» where y= a, — 1. Of course, test- 
ing the hypothesis a, = 1 is equivalent to testing the hypothesis y = 0. Dickey and 
Fuller (1979) actually consider three different regression equations that can be used 
to test for the presence of a unit root: 


AY, = YY + Ep os (4.9) 
Ay, = do + Vint +€, ae (4.10) 
Ay, = dg + Yy,-1 + Got + €, i (4.11) 


The difference between the three regressions concerns the presence of the deter- 
ministic elements ag and at. The first is a pure random walk model, the second 
adds an intercept or drift term, and the third includes both a drift and linear time 
trend. 

The parameter of interest in all the regression equations is y; if y = 0, the {y,} se- 
quence contains a unit root. The test involves estimating one (or more) of the equa- 
tions above using OLS in order to obtain the estimated value of y and associated 
standard error. Comparing the resulting t-statistic with the appropriate value re- 
ported in the Dickey—Fuller tables allows the researcher to determine whether to ac- 
cept or reject the null hypothesis Y = 0. 

Recall that in (4.3), the estimate of y, = a,y,_, + €, was such that a, = 0.9546 with 
a standard error of 0.030. Clearly, the OLS regression in the form Ay, = Yy,_; + €, 
will yield an estimate of y equal to -0.0454 with the same standard error of 0.030. 
Hence, the associated t-statistic for the hypothesis y = 0 is —1.5133 (-0.0454/0.03 = 
~1,5133). 

The methodology is precisely the same, regardless of which of the three forms of 
the equations is estimated. However, be aware that the critical values of the t-statis- 
tics do depend on whether an intercept and/or time trend is included in the regres- 
sion equation. In their Monte Carlo study, Dickey and Fuller (1979) found that the 
critical values for y = 0 depend on the form of the regression and sample size. The 
statistics labeled T, t,, and T, are the appropriate statistics to use for Equations 
(4.9), (4.10), and (4.11), respectively. 

Now, look at Table A at the end of this book. With 100 observations, there are 
three different critical values for the t-statistic y = 0. For a regression without the 
intercept and trend terms (ao = a, = 0), use the section labeled t. With 100 observa- 
tions, the critical values for the t-statistic are -1.61, —1.95 and —2.60 at the 10, 5, 


and 1% significance levels, respectively. Thus, in the hypothetical example with y = 
—0.0454 and a standard error of 0.03 (so that z = -1.5133), it is not possible to reject 
the null of a unit root at conventional significance levels. Note that the appropriate 
critical values depend on sample size. As in most hypothesis tests, for any given 
level of significance, the critical values of the t-statistic decrease as sample size in- 
creases. 

Including an intercept term but not a trend term (only a, = 0) necessitates the use 
of the critical values in the section labeled t,. Estimating (4.4) in the form Ay, = 
ay + YY, + €, necessarily yields a value of y equal to (0.9247 — 1) = -0.0753 with a 
standard error of 0.037. The appropriate calculation for the t, statistic yields 
-0.0753/0.037 = -2.035. If we read from the appropriate row of Table A, with the 
same 100 observations, the critical values are —-2.58, -2.89, and -3.51 at the 10, 5, 
and 1% significance levels, respectively. Again, the null of a unit root cannot be re- 
jected at conventional significance levels. Finally, with both intercept and trend, 
use the critical values in the section labeled t,; now the critical values are -3.45 
and —4,04 at the 5 and l% significance levels, respectively, The equation was not 
estimated using a time trend; inspection of Figure 4.1 indicates there is little reason 
to include a deterministic trend in the estimating equation. 

As discussed in the next section, these critical values are unchanged if (4.9), 
(4.10), and (4.11) are replaced by the autoregressive processes:° 


p 
Ay, = YY + SBA aia +E, (4.12) 
Y i=2 i 
p 
Ay, = @g +Yy, 1+ > iA FE Ga 
i=2 iip 
P 
Ay, = ag +Yy,ı talt Y BAY +E, (4.14) 


i=2 

The same T, %,,, and T, statistics are all used to test the hypotheses y = 0, Dickey 
and Fuller (1981) provide three additional F-statistics (called ,, 6, and ,) to test 
joint hypotheses on the coefficients. With (4.10) or (4.13), the null hypothesis y = 
ay = 0 is tested using the 9, statistic. Including a time trend in the regression—so 
that (4.11) or (4.14) is estimated—the joint hypothesis ay = Y = a, = 0 is tested us- 
ing the , statistic and the joint hypothesis y = a, = 0 is tested using the 9, statistic. 

The $;, $2, and Q; statistics are constructed in exactly the same way as ordinary 
F-tests are: 


ove (RSS(estricted) ~ RSS(unrestricted)]/r 
f RSS(unrestricted)/(T — k) 


where RSS(restricted) and RSS(unrestricted) = the sums of the squared residuals 
from the restricted and unrestricted models l 


number of restrictions 
number of usable observations 
number of parameters estimated in the unrestricted model 


F 
T 
k 


Hence, T — k = degrees of freedom in the unrestricted model. 

Comparing the calculated value of 0, to the appropriate value reported in Dickey 
and Fuller (1981) allows you to determine the significance level at which the re- 
striction is binding. The null hypothesis is that the data are generated by the re- 
stricted model and the altemative hypothesis is that the data are generated by the 
unrestricted model. If the restriction is not binding, RSS(restricted) should be close 
to RSS(unrestricted) and , should be small; hence, large values of ọ; suggest a 
binding restriction and rejection of the null hypothesis. Thus, if the calculated value 
of o, is smaller than that reported by Dickey and Fuller, you can accept the re- 
stricted model (i.e., you do not reject the null hypothesis that the restriction is not 
binding). If the calculated value of 6, is larger than reported by Dickey and Fuller, 
you can reject the null hypothesis and conclude that the restriction is binding. The 
critical values of the three ọ; statistics are reported in Table C at the end of this text. 

Finally, it is possible to test hypotheses concerning the significance of the drift 
term dy and time trend a,. Under the null hypothesis Y = 0, the test for the presence 
of the time trend in (4.14) is given by the Tg, statistic, Thus, this statistic is the test 
a, = 0 given that y= 0. To test the hypothesis ag = 0, use the To statistic if you esti- 
mate (4.14) and the Tap statistic if you estimate (4.13). The complete set of test sta- 
tistics and their critical values for a sample size of 100 are summarized in Table 
4.1, 


Table 4.1 Summary of the Dickey—Fuller Tests 


Critical values for 
95% and 99% 


Model Hypothesis Test Statistic Confidence Intervals 
Ay, = ao t+ YY- t+ att+e,  y=0 Te —3.45 and — 4.04 
ag=Ogiveny=0 te, 3d and 3.78 
a, = Ô given y=0 Tae 2.79 and 3.53 
Y=a,=0 $5 6.49 and 8.73 
a= =a, =0 bs 4.88 and 6.50 
AY, = ao + YY + & y=0 Th | 2.89 and -3.51 
ay = 0 given y=0 Taji 2.54 and 3.22 
a= y=0 >, 4.71 and 6.70 
AY, = YY + & y=0 t -1.95 and ~2.60 


Notes: Critical values are for a sample size of 100. 


An Example 


To illustrate the use of the various test statistics, Dickey and Fuller (1981) use quar- 
terly values of the logarithm of the Federal Reserve Board’s Production Index over 
the 1950:I to 1977:IV period to estimate the following three equations: 


Ay, = 0.52 + 0.00120 — 0.119y,., + 0.498Ay,, +€, RSS = 0.056448 


(0.15) (0.00034) (0.033) (0.081) (4.15) 
Ay, = 0.0054 + 0.447Ay,., + €, RSS = 0.063211 

(0.0025) (0.083) i oe (4.16) 
Ay, =O.511Ay,, +6, UT Behe © RSS = 0.065966 

(0.079) A ie ape (4.17) 


where RSS = residual sum of squares, and standard errors are in parentheses. 

To test the null hypothesis that the data are generated by (4.17) against the alter- 
native that (4.15) is the “true” model, use the 0, statistic. Dickey and Fuller test the 
null hypothesis ay = a, = Y = 0 as follows. Note that the residual sums of squares of 
the restricted and unrestricted models are 0.065966 and 0.056448 and the null hy- 
pothesis entails three restrictions. With 110 usable observations and four estimated 
parameters, the unrestricted model contains 106 degrees of freedom. Since 
0.056448/106 = 0.000533, the 6, statistic is given by 


b, = (0.065966 — 0.056448)/ 3(0.000533) = 5.95 


With 110 observations, the critical value of $, calculated by Dickey and Fuller is 
5.59 at the 2.5% significance level. Hence, it is possible to reject the null hypothe- 
sis of a random walk against the alternative that the data contain an intercept and/or 
a unit root and/or a deterministic time trend (i.e., rejecting ag = a, = Y= 0 means 
that one or more of these parameters does not equal zero). 

Dickey and Fuller also test the null hypothesis a, = y= 0 given the alternative of 
(4.15). Now if we view (4.16) as the restricted model and (4.15) as the unrestricted 
model, the 6, statistic is calculated as 


63 = (0.063211 — 0.056448)/ 2(0.000533) = 6.34 
With 110 observations, the critical value of b, is 6.49 at the 5% significance 


level and 5.47 at the 10% significance level.’ At the 10% level, they reject the null 
hypothesis. However, at the 5% level, the calculated value of , is smaller than the 


_ critical value; they do not reject the null hypothesis that the data contain a unit root 


and/or deterministic time trend. 
To compare with the T, test (i.e., the hypothesis that only y = 0) note that 


t, = -0.119/0.033 = -3.61. 


which rejects the null of a unit root at the 5% level. 


3. EXTENSIONS OF THE DICKEY-FULLER TEST 


Not all time-series processes can be well represented by the first-order autoregres- 
sive process Ay, = dg + Yy,-; + at + €,. It is possible to use the Dickey—Fuller tests 


in higher-order equations such as (4.12), (4.13), and (4.14). Consider the pth-order 
autoregressive process: 


Yı = Ag + AY- + A2YW-2 + 3Y m3 to + Ap-2 Yi-p+2 + Ap ¥i-p+l + G,y;_p +€, (4,18) 


To best understand the methodology of the augmented Dickey—Fuller test, add 
and subtract @,y,_,4; to obtain: 


Yı = Qo + AY + Gg ¥p2 + Agy3 to + Qy.2Vi-p+2 + (a, + Oy )Vi-pst =e A, AY, p+ + €, 


Next, add and subtract (a,_, + @,)y,p42 to obtain 


Yi = Ag + AY + A2Vp-2 + 3Y, 3 +e — lapai + AJAY, pan ~ AAY 541 + € 
P P P po? p+) t 


Continuing in this fashion, we get 


p 
Ay, = ag YY + BAY ia +e, 
, i22 


p ed ee Voyage Be yes 
2 $a (4.19) 


In (4.19), the coefficient of interest is y; if y = 0, the equation is entirely in first 
differences and so has a unit root. We can test for the presence of a unit root using 
the same Dickey—Fuller statistics discussed above. Again, the appropriate statistic 
to use depends on the deterministic components included in the regression equa- 
tion. Without an intercept or trend, use the 7 statistic; with only the intercept, use 
the T, statistic; and with both an intercept and trend, use the T, statistic. It is worth- 
while pointing out that the results here are perfectly consistent with our study of 
difference equations in Chapter 1. If the coefficients of a difference equation sum to 
1, at least one characteristic root is unity. Here, if Za, = 1, y= 0 and the system has 
a unit root. 

Note that the Dickey—Fuller tests assume that the errors are independent and 
have a constant variance. This raises four important problems related to the fact that 
we do not know the true data-generating process. First, the true data-generating 
process may contain both autoregressive and moving average components. We 
need to know how to conduct the test if the order of the moving average terms (if 


a 


yo. unknown. Second, we cannot properly estimate y and its standard error un- 
less all the autoregressive terms are included in the estimating equation. Clearly, 
the simple regression Ay, = dg + Yy,.; + €, is inadequate to this task if (4.18) is the 
_ true data-generating process. However, the true order of the autoregressive process 
is usually unknown to the researcher, so that the problem is to select the appropriate 
lag length. The third problem stems from the fact that the Dickey—Fuller test con- 
siders only a single unit root. However, a pth-order autoregression has p character- 
istic roots; if there are m < p unit roots, the series needs to be differenced m times to 
achieve stationarity. The fourth problem is that it may not be known whether an in- 
tercept and/or time trend belongs in (4.18). We consider the first three problems be- 
` low. Section 7 is concerned with the issue of the appropriate deterministic regres- 
sors. 
Since an invertible MA model can be transformed into an autoregressive model, 
the procedure can be generalized to allow for moving average components. Let the 
{y,} sequence be generated from the mixed autoregressive/moving average process: 


A(L)y, = C(L)e, 


where A(L) and C(L) = polynomials of orders p and q, respectively 


If the roots of C(L) are outside the unit circle, we can write the {y,} sequence as 
the autoregressive process: 


A(L)y,/C(L) = €, 
or, defining D(L} = A(L)/C(L), we get 
D(L)y, = €, 


Even though D(L) will generally be an infinite-order polynomial, in principle we 
can use the same technique as used to obtain (4.19) to form the infinite-order au- 
toregressive model: 


Ay, = YY- + S BAY, iat +E, (4.20) 
i=2 


As it stands, (4.20) is an infinite-order autoregression that cannot be estimated 
using a finite data set. Fortunately, Said and Dickey (1984) have shown that an un- 
known ARIMA(p, 1, q) process can be well approximated by an ARIMA(n, I, 0) 
autoregression of order no more than 7T™?. Thus, we can solve the first problem by 
using a finite-order autoregression to approximate (4.20). The test for y = 0 can be 
conducted using the aforementioned Dickey—Fuller T, Ta, or T, test statistics. 

Now, the second problem concerning the appropriate lag length arises. Including 
too many lags reduces the power of the test to reject the null of a unit root since the 


increased number of lags necessitates the estimation of additional parameters and a 
loss of degrees of freedom. The degrees of freedom decrease since the number of 
parameters estimated has increased and because the number of usable observations 
has decreased. (We lose one observation for each additional lag included in the au- 
toregression.) On the other hand, too few lags will not appropriately capture the ac- 
tual error process, so that y and its standard error will not be well estimated. 

How does the researcher select the appropriate lag length in such circumstances? 
One approach is to start with a relatively long lag length and pare down the model 
by the usual t-test and/or F-tests. For example, one could estimate Equation (4.20) 
using a lag length of n*. If the t-statistic on lag n* is insignificant at some specified 
critical value, reestimate the regression using a lag length of n* — 1. Repeat the 
process until the lag is significantly different from zero. In the pure autoregressive 
case, such a procedure will yield the true lag length with an asymptotic probability 
of unity, provided that the initial choice of lag length includes the true length. With 
seasonal data, the process is a bit different. For example, using quarterly data, one 
could start with 3 years of lags (n = 12). If the t-statistic on lag 12 is insignificant at 
some specified critical value and an F-test indicates that lags 9 to 12 are also in- 
significant, move to lags 1 to 8. Repeat the process for lag 8 and lags 5 to 8 until a 
reasonable lag length has been determined. 

Once a tentative lag length has been determined, diagnostic checking should be 
conducted. As always, plotting the residuals is a most important diagnostic tool. 
There should not appear to be any strong evidence of structural change or serial 
correlation. Moreover, the correlogram of the residuals should appear to be white 
noise. The Ljung-Box Q-statistic should not reveal any significant autocorrelations 
among the residuals. It is inadvisable to use the alternative procedure of beginning 
with the most parsimonious model and keep adding lags until a significant lag is 
found. In Monte Carlo studies, this procedure is biased toward selecting a value of 
n that is less than the true value. 


Multiple Roots 


Dickey and Pantula (1987) suggest a simple extension of the basic procedure if 
more than one unit root is suspected. In essence, the methodology entails nothing 
more than performing Dickey~Fuller tests on successive differences of {y,}. When 
exactly one root is suspected, the Dickey—Fuller procedure is to estimate an equa- 
tion such as Ay, = dy + Yy,_, + €, Instead, if two roots are suspected, estimate the 
equation: 


A’y, = dy + BAY, tE, (4.21) 


Use the appropriate statistic (i.e., T, Ta, or T, depending on the deterministic ele- 
ments actually included in the regression) to determine whether B, is significantly 
different from zero. If you cannot reject the null hypothesis that P, = 0, conclude 
that the {y,} sequence is /(2). If B, does differ from zero, go on to determine 
whether there is a single unit root by estimating 


Tene Pa 


FO ee ee EE 


Ay, = ao + Bi Ay, + Baya + & (4.22) 


Since there are not two unit roots, you should find that B, and/or B, differ from 
zero. Under the null hypothesis of a single unit root, B, < 0 and B, = 0; under the al- 
ternative hypothesis, {y,} is stationary, so that B, and 8, are both negative. Thus, 
estimate (4.22) and use the Dickey—Fuller critical values to test the null hypothesis 
B, = 0. If you reject this null hypothesis, conclude that {y,} is stationary. 

As a rule of thumb, economic series do not need to be differenced more than two 
times. However, in the odd case in which at most r unit roots are suspected, the 
procedure is to first estimate 


AY, = âg + Byam ya + €, 


If A’y, is stationary, you should find that —2 < B, < 0. If the Dickey—Fuller critical 
values for B, are such that it is not possible to reject the null of a unit root, you ac- 
cept the hypothesis that {y,} contains r unit roots. If you reject this null of exactly 
r unit roots, the next step is to test for r — 1 roots by estimating 


Ay, = ao + BAY, + BAY +E, 


If both B, and B, differ from zero, reject the null hypothesis of r — | unit roots. 
You can use the Dickey—Fuller statistics to test the null of exactly r — | unit roots if 
the t statistics for B, and B, are both statistically different from zero. If you can re- 
ject this null, the next step is to form 


AY, = do + BA'Y + BAY + BATY F €, 


As long as it is possible to reject the null hypothesis that the various values of the 
B; are nonzero, continue toward the equation: 


Ay, = ao + BA'Y, + BA Pya + BA Ty + + By tE, 


Continue in this fashion until it is not possible to reject the null of a unit root or 
the y, series is shown to be stationary. Notice that this procedure is quite different 
from the sequential testing for successively greater numbers of unit roots. It might 
seem tempting to test for a single unit root and, if the nuli cannot be rejected, go on 
to test for the presence of a second unit root. In repeated samples, this method tends 
to select too few roots. 


Seasonal Unit Roots 


You will recall that the best-fitting model for the monthly Spanish tourism. data 
used in Chapter 2 had the form: 


d -L'A - Dy, = (1 + BL) + Bale, 


Tourist visits to Spain have a unit root and seasonal unit root. Since seasonality 
is a key feature of many economic series, a sizable literature has developed to test 
for seasonal unit roots. Before proceeding, note that the first difference of a sea- 


sonal unit root process will not be stationary. To keep matters simple, suppose that 
the quarterly observations of {y,} are generated by 


Y: = Y4 + € 


Here, the fourth difference of {y,} is stationary; using the notation of Chapter 2, 
we can write A,y, = €,. Given the initial condition yọ = y_, = + = 0, the solution for 
y, is: 


Y= Er t E4 + Eg to 


so that 


1/4 1/4 


Yn YS Se i Jag 
i=0 i=0 


Hence, Ay, equals the difference between two stochastic trends. Since the vari- 
ance of Ay, increases without limit as t increases, the {Ay,} sequence is not station- 
ary. However, the seasonal difference of a unit root process may be stationary. For 


example, if {y,} is generated by y, = yı + €, the fourth difference (i.e., Aa, =€, + 


E1 + €_. + 3) is stationary. However, the variance of the fourth difference is 
larger than the variance of the first difference. The point is that the Dickey—Fuller 
procedure must be modified in order to test for seasonal unit roots and distinguish 
between seasonal versus nonseasonal roots. 

There are several alternative ways to treat seasonality in a nonstationary se- 
quence. The most direct method occurs when the seasonal pattern is purely deter- 
ministic. For example, let D,, D}, and D, represent quarterly seasonal dummy vari- 


ables such that the value of D, is unity in season i and zero otherwise. Estimate the 
regression equation: 


Y: = Ay + QD, + O,D, + 03D; +9, (4.23) 


where y, is the regression residual, so that y, can be viewed as the deseasonalized 
value of y,. 


Next, use these regression residuals to estimate the regression: 


fris p 
=e Ay, F YS, +Ý BAS +E, iad 
i=2 


ve null hypothesis of a unit root (i.e., y = 0) can be tested using the Dickey- 
Fuller q, statistic. Rejecting the null hypothesis is equivalent to accepting the alter- 
native that the {y,} sequence is stationary. The test is possible as Dickey, Bell, and 
Miller (1986) show that the limiting distribution for y is not affected by the removal 
of the deterministic seasonal components. If you want to include a time trend in 
(4.23), use the Tt, statistic. 

If you suspect a seasonal unit root, it is necessary to use an alternative procedure. 
To keep the notation simple, suppose you have quarterly observations on the {y,} 
sequence and want to test for the presence of a seasonal unit root. To explain the 
methodology, note that the polynomial (1 — yL‘) can be factored, so that there are 
four distinct characteristic roots: 


a-y = (1 -y DU +Y DU -iy DU +iY”L) 4.2 


If y, has a seasonal unit root, y = 1. Equation (4.24) is a bit restrictive in that it 
only allows for a unit root at an annual frequency. Hylleberg et al. (1990) develop a 
clever technique that allows you to test for unit roots at various frequencies; you 
can test for a unit root (i.e., a root at a zero frequency), unit root at a semiannual 
frequency, or seasonal unit root. To understand the procedure, suppose y, is gener- 
ated by 


A(L)y, = & 


where A(L) ¥§ a fourth-order polynomial such that 
(Q-a, LX] + aL — aiL) + a4iL)y, = €, (4.25) 
Now, if a, = a, = a, = a, = 1, (4.25) is equivalent to setting y = | in (4.24). 


Hence, if a, = a) = a; = a, = 1, there is a seasonal unit root. Consider some of the 
other possible cases: 


CASE 1 


T ASSESS SS 


If a, = 1, one homogeneous solution to (4.25) is y, = y- AS such, the {y,} se- 
quence tends to repeat itself each and every period. This is the case of a nonsea- 
sonal unit root. 


CASE 2 

re ee SY 
pa a 
If a, = 1, one homogeneous solution to (4.25) is y, + Yı = 0. In this instance, the 
sequence tends to replicate itself at 6-month intervals, so that there is a semiannual 
unit root. For example, if y, = 1, it follows that yapı =—1, y.2 = +1, yna = 21, Yaa = 
1, etc. 


CASE 3 


DEE EEE te teeereeeeee te E EE E 


If either a3 OF a, is equal to unity, the {y,} sequence has an annual cycle. For exam- 
ple, if oa = 1, a homogeneous solution to (4.25) is y, = iy,_,. Thus, if y,= 1, yn =i 


Yao =? =- L, Yng = —i, and y,,4=—i? = 1, so that the sequence replicates itself every 
fourth period. 


To develop the test, view (4.25) as a function of a,, a», ay, and a, and take a 
Taylor series approximation of A(L) around the point a, = a, = a; = a, = | 
Although the details of the expansion are messy, first take the partial derivative: 


GA(L)/da, = A(1 — a,L)(1 + aL)(1 - ayiL)(1 + a,iL)/aa, 
=-(1 + aL) — ayiL)(1 + agiL)L 


Evaluating this derivative at the point a, = a, =a, = a, = 1 yields 
-L(1 + L) — iL) + iL) =~L(01 + LX + L?) = -L0 +L + L +L) 


Next, form 


ðA(Lða, = A(1 — a,L)(1 + agL)(1 — aziL)(1 + agiL/aa, 
=(1-a,L)(1 - ayiL)(1 + agiLL 


Evaluating at the point a, = a, = a, = a4 = 1 yields (1 -L + L? — L*)L. It should 
not take too long to convince yourself that evaluating dA(L)/da, and dA(L)/da, at 
the point a, = a, = a, =a,=1 yields 

dA(LV/Ga, = —(1 — LA + iL)iL 
and 
dA(LY/da, = (1 — L*)(1 - iL)iL 


Since A(L) evaluated at a, = a, = a; =a, = 1 is (1 —L), it is possible t 3 
mate (4.25) by Ae ), possible to approxi 


[O -LY-LU+L+0?+ Da, -1)+(1-L4+ 2 - )La - 1) 
—(1-L*)(1 + iLjiL{ay - 1) + (1 -LXI ~ iL)iL{a, - 1)]y, = 


Define y; such that y; = (a; — 1) and note that (1 + iL)i=i-Land (1 — iL)i =i + 
L; hence, 


A-L y= nA LAL +Y ah LL- L’ Yri 
+ (1 -LAMG - L) = yali + Dy) +E, 


di esting for Trends and Unit Koots 


so that 


(i - Loy, 2y(1 + b+ 224 Dy, A - b+? - Dy, 
+U- LY -Yi + YL +e 420) 


To purge the imaginary numbers from (4.26), define Y; and Ys such that 2y, = 
-Ys — iY; and 2y, = Ys + iys. Hence, (Ys — Ya)i = Ys and Y, + Ya = Ye- Substituting into 
(4.26) yields ; 


(-L)y sy + b+ + PY -pd-£+0-L)y., 
+(1- LY = YoL)Y-ı + €, 


Thus, to implement the procedure, use the following steps: 
STEP 1: Form the following variables: 


Yua +b F L? + LY SS tea t Yea t Yea 
Y-i = (I -L+ L sa Lys 2a Vat a yea 
Yan =O - Ly) = Yia T Yes so that Yaz = Yi-2 T Vena 


STEP 2: Estimate the regression: 
(1 = L*)y, = WY tent Z Yaar + YsYar-i TYoVar-2 + € 


You might want to modify the form of the equation by including an in- 
tercept, deterministic seasonal dummies, and a linear time trend. As in the 
augmented form of the Dickey—Fuller test, lagged values of (1 — Ly; 
may also be included. Perform the appropriate diagnostic checks to ensure 
that the residuals from the regression equation approximate a white-noise 
process. 


STEP 3: Form the t-statistic for the null hypothesis Y, = 0; the appropriate critical 
“85% values are reported in Hylleberg et al. (1990). If you do not reject the hy- 
pothesis y, = 0, conclude that a, = 1, so there is a nonseasonal unit root. 
Next, form the t-test for the hypothesis y, = 0. If you do not reject the null 
hypothesis, conclude that a, = 1 and there is a unit root with a semiannual 
frequency. Finally, perform the F-test for the hypothesis Ys = Ye = O. If the 
calculated value is less than the critical value reported in Hylleberg et al. 
(1990), conclude that Y, and/or Ye is zero, so that there is a unit root with 
an annual frequency. Be aware that the three null hypotheses are not alter- 
natives; a series may have nonseasonal, semiannual, and annual unit roots. 


At the 5% significance level, Hylleberg et al. (1990) report that the critical val- 


ues using 100 observations are: 


%. n=0 2=0 Ys=%=0 
Intercept l -2.88 -1.95 3.08 
Intercept plus seasonal dummies =2:95 -2.94 ; 6.57 
Intercept plus seasonal dummies -3.53 i 2.94 : 6.60 


plus time trend 


cere OF THE AUGMENTED DICKEY-FULLER 


The last chapter reviewed the evidence reported by Nelson and Plosser (1982) sug- 
gesting that macroeconomic variables are difference stationary rather than trend 
stationary. We are now in a position to consider their formal tests of the hypothesis 
For each series under study, Nelson and Plosser estimated the regression: l 


p 
Ay, =a) tatty Y- + Seis +E, 
i=2 


The chosen lag lengths are reported in the column labeled p in Table 4.2. The es- 


timated values ao, a2, and y are reported in columns 3, 4, and 5, respectively. 


Table 4.2 Nelson and Plosser’s Tests for Unit Roots 
a ee 


p ao a, Y Yy+1 

Real GNP 2 0.819 0.006 0.175 0.825 
(3.03) (3.03) (-2.99) 

Nominal 2 1.06 0.006 —0.101 0.899 

GNP = 4 (237) (2.34) 00 232) 

Industrial 6 0.103 0.007 -0.165 0835 

production o (4.32) (2.44) (-2.53) 

Unemployment 4 518 0.000 —0.294* 0.706 

rate (2.81) (-0.23) (~3.55) 


meree 

Notes: 1. pis the chosen lag length. Coefficients divided by their standard errors are in parentheses 
Thus, entries in parentheses represent the t-test for the null hypothesis that a coefficient i 
equal to zero. Under the null of nonstationary, it is necessary to use the Dickey—Fuller criti- 
cal values. At the 0.05 significance level, the critical value for the t-statistic is =3.45. 

2. An asterisk (*) denotes significance at the 0.05 level. For real and nominal GNP and indus- 
trial production, it is not possible to reject the null y = 0 at the 0.05 level. Hence, the unem- 
ployment rate appears to be stationary. ' 

3. The expression y + 1 is the estimate of the partial autocorrelation between y, and y,_,. 


wt Testing for frends and vnu nouui» 


Recall that the traditional view of business cycles maintains that the GNP and 
production levels are trend stationary rather than difference stationary. An adherent 
of this view must assert that y is different from zero; if y = 0, the series has a unit 
root and is difference stationary. Given the sample sizes used by Nelson and 


` Plosser (1982), at the 0.05 level, the critical value of the r-statistic for the null hy- 


pothesis y = 0 is —3.45. Thus, only if the estimated value of Y is more than 3.45 
standard deviations from zero, is it possible to reject the hypothesis that y = 0. As 
can be seen from inspection of Table 4.2, the estimated values of y for real GNP, 
nominal GNP, and industrial production are not statistically different from zero. 
Only the unemployment rate has an estimated value of y that is significantly differ- 
ent from zero at the 0.05 level. 


Unit Roots and Purchasing-Power Parity 


Purchasing-power parity (PPP) is a simple relationship linking national price levels 
and exchange rates. In its simplest form, PPP asserts that the rate of currency depre- 
ciation is approximately equal to the difference between the domestic and foreign 
inflation rates. If p and p* denote the logarithms of the U.S. and foreign price levels 
and e the logarithm of the dollar price of foreign exchange, PPP implies 


e =p,- př td, 


where d, represents the deviation from PPP in period t. 


Figure 4.2 Real exchange rates. 
1.6 ny ij 1 


1.2 


qij 
yl: abunu fi 


[i [i ili li H ESOL i 
1973 1975 1977 1979 1981 1983 1985 1987 1989 
— Canada — Germany —— Japan 


(Jan. 1973 = 1.00) 


In applied work, p, and p# usually refer to national price indices in 7 relative to a 
base year, so that e, refers to an index of the domestic currency price of foreign ex- 
change relative to a base year. For example, if the U.S. inflation rate is 10% while 
the foreign inflation rate is 15%, the dollar price of foreign exchange should fall by 
approximately 5%. The presence of the term d, allows for short-run deviations from 
PPP. 

Because of its simplicity and intuitive appeal, PPP has been used extensively in 
theoretical models of exchange rate determination. However, as in the well-known 
Dornbusch (1976) “overshooting” model, real economic shocks, such as productiv- 
ity or demand shocks, can cause permanent deviations from PPP. For our purposes, 
the theory of PPP serves as an excellent vehicle to illustrate many time-series test- 
ing procedures. One test of long-run PPP is to determine whether d, is stationary. 
After all, if the deviations from PPP are nonstationary (i.e., if the deviations are 
permanent in nature), we can reject the theory. Note that PPP does allow for persis- 
tent deviations; the autocorrelations of the {d,} sequence need not be zero. One 
popular testing procedure is to define the “real” exchange rate in period r as 


r, = e,+ pk —p, 


Long-run PPP is said to hold if the {r,} sequence is stationary. For example, in 
Enders (1988), I constructed real exchange rates for three major U.S. trading part- 
ners: Germany, Canada, and Japan. The data were divided into two periods: 
January 1960 to April 1971 (representing the fixed exchange rate period) and 
January 1973 to November 1986 (representing the flexible exchange rate period). 
Each nation’s Wholesale Price Index (WPI) was multiplied by an index of the U.S. 
dollar price of the foreign currency and then divided by the U.S. WPI. The log of 
the constructed series is the {r,} sequence. Updated values of the real exchange rate 
data used in the study are in the file REAL.PRN contained on the data disk. As an 
exercise, you should use this data to verify the results reported below. 

A critical first step in any econometric analysis is to visually inspect the data. 
The plots of the three real exchange rate series during the flexible exchange rate pe- 
riod are shown in Figure 4.2. Each series seems to meander in a fashion characteris- 
tic of a random walk process. Notice that there is little visual evidence of explosive 
behavior or a deterministic time trend. Consider Figure 4.3 that shows the autocor- 
relation function of the Canadian real rate in levels, part (a), and first differences, 
part (b). This autocorrelation pattern is typical of all the series in the analysis. The 
autocorrelation function shows little tendency to decay, whereas the autocorrela- 
tions of the first differences display the classic pattern of a stationary series. In 
graph (b), all autocorrelations (with the possible exception of p,, that equals 0.18) 
are not statistically different from zero at the usual significance levels. 

To formally test for the presence of a unit root in the real exchange rates, aug- 
mented Dickey—Fuller tests of the form given by (4.19) were conducted. The re- 


gression Ar, = ag + Yı + B2Ar,_, + B3Ar,_2 + + was estimated based on the follow- 
ing considerations: 


: 


i testing for trends and Unit Koots 


Figure 4.3 ACF of Canada’s real exchange rate. = 
Levels. ~: First differences. 


123456 78 9101112 123456 7 89101112 


1. The theory of PPP does not allow for a deterministic time trend or multiple unit 


roots. Any such findings would refute the theory as posited. Although all the se- 
ries decline throughout the early 1980s and all rise during the mid to late 1980s, 
there is no a priori reason to expect a structural change. Pretesting the data using 
the Dickey-Pantula (1987) strategy showed no evidence of multiple unit roots. 
Moreover, there was no reason to entertain the notion of trend stationarity; the 
expression af was not included in the estimating equation. 


2. In both time periods, F-tests and the SBC indicated that B, through B34 could be 
set equal to zero. For Germany and Japan during the flexible rate period, Ba was 
statistically different from zero; in the other four instances, B, could be set equal 
to zero. In spite of these findings, with monthly data it is always important to 
entertain the possibility of a lag length no shorter than 12 months. As such, tests 
were conducted using the short lags selected by the F-tests and SBC and using a 
lag length of 12 months. 


For the Canadian case during the 1973 to 1986 period, the t-statistic for the null 
hypothesis that y = 0 is -1.42 using no lags and ~1.51 using all 12 lags. Given the 
critical value of the T, statistic, it is not possible to reject the null of a unit root in 
the Canadian/U.S. real exchange rate series. Hence, PPP fails for these two nations. 
In the 1960 to 1971 period, the calculated value of the t-statistic is —1.59; again, it 
is possible to conclude that PPP fails. 

Table 4.3 reports the results of all six estimations using the short lag lengths sug- 
gested by the F-tests and SBC. Notice the following properties of the estimated 
models: 


1. For all six models, it is not possible to reject the null hypothesis that PPP fails. 
As can be seen from the last column of Table 4.3, the absolute value of the 1-sta- 


MENGE EE UJ LNG Fag esl ee MEE RE YL Meee! A CI 


tistic for the null y = 0 is never more than 1.59. The economic interpretation is 
that real productivity and/or demand shocks have had a permanent influence on 
real exchange rates. 


. As measured by the sample standard deviation (SD), real exchange rates were 


far more volatile in the 1973 to 1986 period than the 1960 to 1971 period. 
Moreover, as measured by the standard error of the estimate (SEE), real ex- 
change rate volatility is associated with unpredictability. The SEE during the 
flexible exchange rate period is several hundred times that of the fixed rate pe- 
riod. It seems reasonable to conclude that the change in the exchange rate 
regime (i.e., the end of Bretton—-Woods) affected the volatility of the real ex- 
change rate. 


. Care must be taken to keep the appropriate null hypothesis in mind. Under the 


null of a unit root, classical test procedures are inappropriate and we resort to the 
Statistics tabulated by Dickey and Fuller. However, classical test procedures 
(which assume stationary variables) are appropriate under the null that the real 
rates are stationary. Thus, the following possibility arises. Suppose that the t-sta- 
tistic in the Canadian case happened to be —2.16 instead of ~1.42. Using the 
Dickey—Fuller critical values, you would not reject the null of a unit root; hence, 
you could conclude that PPP fails. However, under the null of stationarity 
(where we can use classical procedures), yis more than two standard deviations 
from zero and you would conclude PPP holds since the usual t-test becomes ap- 
plicable. 

This apparent dilemma commonly occurs when analyzing series with roots 
close to unity in absolute value. Unit root tests do not have much power in dis- 
criminating between characteristic roots close to unity and actual unit roots. The 
dilemma is only apparent since the two null hypotheses are quite different. It is 
perfectly consistent to maintain a null that PPP holds and not be able to reject a 
null that PPP fails! Notice that this dilemma does not actually arise for any of 
the series reported in Table 4.3; for each, it is not possible to reject a null of 
y= 0 at conventional significance levels. 


. Looking at some of the diagnostic statistics, we see that all the F-statistics indi- 


cate that it is appropriate to exclude lags 2 (or 3) through 12 from the regression 
equation. To reinforce the use of short lags, notice that the first-order correlation 
coefficient of the residuals (p) is low and the Durbin—Watson statistic close to 2. 
It is interesting that all the point estimates of the characteristic roots indicate that 
real exchange rates are convergent. To obtain the characteristic roots, rewrite 
the estimated equations in the autoregressive form r, = dọ + @yr,_; Of T, = do + 
a,r, + âr, 2. For the four AR(1) models, the point estimates of the slope coef- 
ficients are all less than unity. In the post~Bretton—Woods period (1973-1986), 
the point estimates of the characteristic roots of Japan’s second-order process 
are 0.931 and 0.319; for Germany, the roots are 0.964 and 0.256. However, this 
is precisely what we would expect if PPP fails; under the null of a unit root, we 
know that y is biased downward. 


Table 4.3 Real Exchange Rate Estimation 


Hy: y=9 


SD/SEE 


a, Mean p/DW 


a, 


1973-1986 


Canada 


~1.42 


5.47 
1.16 


0.059 0.194 
10.44 


1.88 
—0.007 


1.05 


0.978 


~0,022 


(0.0155) 


~0.047 


2.81 
20.68 


0.226 


1.01 


-0.297 


1.25 


Japan 


2.0K | 
~0.014 


(0.074) 
—0.027 


0.280 


l 


LI 0.858 


—0.247 


1.22 


Germany 


3.71 


2.004 


(0.076) 


1960-197} 
Canada 


-1.59 


0.014 


0.434 


-0.107 


1.02 


0.969 


0.031 


0.004 


0.017 


2.21 


(0.019) 
—0.030 


~1.04 


0.330 


0.046 
1.98 


0.005 


0.026 


0.980 


0.970 


Japan 


(0.028) 
—0.016 


t= -1.23 


0.097 


0.038 
1.93 


1.01 


0.984 


Germany 


0.004 


(0.012) 


Standard errors are in parentheses 


1, 


Notes: 


~ a3), SD the standard deviation of the real exchange rate, SEE the estimated standard deviation of the residu- 


2. Mean is the estimated value of a)/(1 — a, 


gs 2 (or 3) through 12 can be excluded, DW the Durbin—Watson sta- 


als (i.e., the standard error of the estimate). F the significance level of the test that la 


tistic for first-order serial correlation, and p the estimated autocorrelation coefficient. 


=0. 


3. Entries are the t-statistic for the hypothesis Y 


5. PHILLIPS-PERRON TESTS 


The distribution theory supporting the Dickey—Fuller tests assumes that the errors 
are statistically independent and have a constant variance. In using this methodol- 
ogy, care must be taken to ensure that the error terms are uncorrelated and have 


- constant variance. Phillips and Perron (1988) developed a generalization of the 


Dickey—Fuller procedure that allows for fairly mild assumptions concerning the 
distribution of the errors. 


To briefly explain the procedure, consider the following regression equations: 


y,= ag + afy1 +f, (4.27) 


and 


Yi = Gg + Yi + &(t — T/2) +p, (4.28) 
where T = number of observations and the disturbance term p, is such that 
Eu, = 0, but there is no requirement that the disturbance term is serially un- 
correlated or homogeneous. Instead of the Dickey—Fuller assumptions of 
independence and homogeneity, the Phillips—Perron test allows the distur- 
bances to be weakly dependent and heterogeneously distributed. 


Phillips and Perron characterize the distributions and derive test statistics that 
can be used to test hypotheses about the coefficients af and &; under the null hy- 
pothesis that the data are generated by 


Y= Ymi + H 


The Phillips—Perron test statistics are modifications of the Dickey—Fuller t-statis- 
tics that take into account the less restrictive nature of the error process, The ex- 
pressions are extremely complex; to actually derive them would take us far beyond 
the scope of this book. However, many Statistical time-series software packages 
now calculate these statistics, so that they are directly available. For the ambitious 
reader, the formulas used to calculate these statistics are reported in the appendix to 
this chapter. The most useful of the test statistics are as follows: 


Z(ta¥): Used to test the hypothesis a¥ = | 
Z(ta,): Used to test the hypothesis @, = 1 
Z(tG,): Used to test the hypothesis 4, = 0 
Z($3): Used to test the hypotheses d, = 1 and ad, =0 


The critical values for the Phillips—Perron statistics are precisely those given for 
the Dickey—Fuller tests. For example, the critical values for Z(ta*) and Z(td,) are 
those given in the Dickey—Fuller tables under the headings t, and Tẹ, respectively. 
The critical values of Z($3) are given by the Dickey—Fuller 9, statistic. 


mina it comarca sa p aanta sinned ineasan can etic oo heise 


r 


Do not be deceived by the apparent simplicity of Equations (4.27) and (4.28). In 
reality, it is far more general than the type of data-generating process allowable by 
the Dickey—Fuller procedure. For example, suppose that the {u,} sequence is gener- 
ated by the autoregressive process u, = (C(L)/B(L)Je,, where BCL) and C(L) are 
polynomials in the lag operator. Given this form of the error process, we can write 
Equation (4.27) in the form used in the Dickey—Fuller tests; that is, 


B(L)y, = a&B(L) + afB(L)y,.. + C(L)€, 
or 
(1 ~ afL)B(L)y, = a + C(L)e, 


where afB(L)=a 


Thus, the Phillips~Perron procedure can be applied to mixed processes in the 
same way as the Dickey—Fuller tests. 


Foreign Exchange Market Efficiency 


Corbae and Ouliaris (1986) used Phillips—Perron tests to determine whether (1) ex- 
change rates follow a random walk and (2) the return to forward exchange market 
speculation contains a unit root. Denote the spot dollar price of foreign exchange on 
day t as s,. An individual at ¢ can also buy or sell foreign exchange forward. A 90- 
day forward contract requires that on day ¢ + 90, the individual take delivery (or 
make payment) of a specified amount of foreign exchange in return for a specified 
amount of dollars. Let f, denote the 90-day forward market price of foreign ex- 
change purchased on day t. On day t, suppose that an individual speculator buys 
forward pounds at the price f, = $2.00/pound. Thus, in 90 days the individual is ob- 
ligated to provide $200,000 in return for £100,000. Of course, the agent may 
choose to immediately sell these pounds on the spot market. If on day ¢ + 90, the 
spot price happens to be 5,,9) = $2.01/pound, the individual can sell the £100,000 
for $201,000; without transactions costs taken into account, the individual earns 
a profit of $1000. In general, the profit on such a transaction will be S,,99 — f, 
multiplied by the number of pounds transacted. (Note that profits will be negative if 
S90 < f,.) Of course, it is possible to speculate by selling forward pounds also. An 
individual selling 90-day forward pounds on day ¢ will be able to buy them on the 
spot market at 5,,99. Here, profits will be f, — 5,499 multiplied by the number of 
pounds transacted. The efficient market hypothesis maintains that the expected 
profit or loss from such speculative behavior must be zero. Let ES, denote the 
expectation of the spot rate for day + 90 conditioned on the information available 
on day t. Since we actually know f, on day t, the efficient market hypothesis for 
forward exchange market speculation can be written as 


ES,490 = fı l 


or 
$1490 7 Fei =p, 
where p, = per unit profit from speculation 
Ep = 0 


Thus, the efficient market hypothesis requires that for any time period 1, the 90- 
day forward rate (i.e., f,) be an unbiased estimator of the spot rate 90 days from z. 
Suppose that a researcher collected weekly data of spot and forward exchange 


rates. The data set would consist of the forward rates fn fiir, fisia- and Spot 
rates Sp Say S14- - By using these exchange rates, it is possible to construct the 
SEQUENCE $,,90 — fi = Pp 5147490 T fia = Pirm 51414490 T Frais = Praia eves Normalize 


the time period to | week, so that y; = Pp Y2 = Prats Y3 = Praia » and consider the 
regression equation (where ~ is dropped for simplicity): 


Yr = Ag + yyy ta + p, 


The efficient market hypothesis asserts that ex ante expected profit must equal 
zero; hence, with quarterly data, it should be the case that ay = a, = a, = 0. 
However, the way that the data set was constructed means that the residuals will be 
correlated. As Corbae and Ouliaris (1986) point out, suppose that there is relevant 
exchange market “news” at date 7. Agents will incorporate this news into all for- 
ward contracts signed in periods subsequent to T. However, the realized returns for 
all preexisting contracts will be affected by the news. Since there are approximately 
13 weeks in a 90-day period, we can expect the u, sequence to be an MA(12) 
process. Although ex ante expected returns may be zero, the ex post returns from 
speculation at ¢ will be correlated with the returns from those engaging forward 
contracts at weeks t+ 1 through t+ 12. 

Meese and Singleton (1982) assumed white-noise disturbances in using a 
Dickey—Fuller test to study the returns from forward market speculation. One sur- 
prising result was that the return from forward speculation in the Swiss franc con- 
tained a unit root. This finding contradicts the efficient market hypothesis since it 
implies the existence of a permanent component in the sequence of returns. 
However, the assumption of white-noise disturbances is inappropriate if the {u,} 
sequence is an MA(12) process. Instead, Corbae and Ouliaris use the more appro- 
priate Phillips-Perron procedure to analyze foreign exchange market efficiency; 
some of their results are contained in Table 4.4. 

First, consider the test for the unit root hypothesis (i.e., a, = 1). All estimated 
values of a, exceed 0.9; the first-order autocorrelation of the returns from specula- 
tion appears to be quite high. However, given the small standard errors, all esti- 
mated values are over four standard deviations from unity. At the 5% significance 
level, the critical value for a test of a, = 1, is -3.43. Note that this critical value is 
the Dickey—Fuller 1, statistic with 250 observations. Hence, as opposed to Meese 


MERGE OED Seren 


‘Table 4.4 Returns to Forward Speculation 


AOA AR E AE PT PE SS TE EE SER E EI 


ay a, a, 

Switzerland —O0.117E-2 0.941 -0.1 1 1E-4 
(0.106E-2) (0.159E-1) (0.834E-5) 

is Zla) = -1.28 Z(ta,) = —4.06 = Z(ta,) =~1.07 
Canada =" -0.651E-3 cxa 0.907 Si 0.116E-5 
Lo Ts (0.409E-3) 282 (0,191E-1) 251 3 (0.298E-5) 

Ztay) = -1.73 Z(ta,) = -5.45 Z(ta,) = -1.42 
U.K. ak ~0.7719E-3 BSS 0.937 ‘0 132B-4 
i (0.903E-3) ee (0.163E-1) : (0.720E-5) 


Z(tag) = 0.995 ==: Z(ta,) = 4.69 Z(tay) = -1.50 


Notes: |. Standard errors are in parentheses. 
2. Z(tao) and Z(ta,) are the Phillips—Perron adjusted -statistics for the hypotheses that ag = 0 
and a, = 0, respectively. Z(ta,) is the Phillips—Perron adjusted t-statistic for the hypothesis 
that a, = 1. 


and Singleton (1982), Corbae and Ouliaris are able to reject the null of a unit root 
in all series examined. Thus, shocks to the return from forward exchange market 
speculation do not have permanent effects. 

A second necessary condition for the efficient market hypothesis to hold is that 
the intercept term a, equal zero. A nonzero intercept term suggests a predictable 
gap between the forward rate and spot rate in the future. If ag # 0, on average, 
there are unexploited profit opportunities. It may be that agents are risk-averse or 
profit-maximizing speculators are not fully utilizing all available information in de- 
termining their forward exchange positions. In absolute value, all the Z(fdo) statis- 
tics are less than the critical value, so that Corbae and Ouliaris cannot reject the 
null a) =Q. In the same way, they are not able to reject the null hypothesis of no de- 
terministic time trend (i.e., that a, = 0). The calculated Z(td.) statistics indicate that 
the estimated coefficients of the time trend are never more than 1.50 standard errors 
from zero. 

At this point, you might wonder whether it would be possible to perform the 
same sort of analysis using an augmented Dickey~Fuller (ADF) test. After all, Said 
and Dickey (1984) showed that the ADF test can be used when the error process is 
a moving average. The desirable feature of the Phillips—Perron test is that it allows 
for a weaker set of assumptions concerning the error process. Also, Monte Carlo 
studies find that the Phillips—Perron test has greater power to reject a false null hy- 
pothesis of a unit root. However, there is a cost entailed with the use of weaker as- 
sumptions. Monte Carlo studies have also shown that in the presence of negative 
moving average terms, the Phillips~Perron test tends to reject the null of a unit root 
whether or not the actual data-generating process contains a negative unit root. It is 
preferable to use the ADF test when the true model contains negative moving aver- 
age terms and the Phillips—Perron test when the true model contains positive mov- 
ing average terms. 


In practice, the choice of the most appropriate test can be difficult since you 
never know the true data-generating process. A safe choice is to use both types of 
unit roots tests. If they reinforce each other, you can have confidence in the results. 
Sometimes, economic theory will be helpful in that it suggests the most appropriate 
test. In the Corbae and Ouliaris example, excess returns should be positively corre- 
lated; hence, the Phillips—Perron test is a reasonable choice. 


6. STRUCTURAL CHANGE 


In performing unit root tests, special care must be taken if it is suspected that struc- 
tural change has occurred. When there are structural breaks, the various Dickey- 
Fuller and Phillips—Perron test statistics are biased toward the nonrejection of a unit 
root. To explain, consider the situation in which there is a one-time change in the 
mean of an otherwise stationary sequence. In the top graph (a) of Figure 4.4, the 
{y,} sequence was constructed so as to be stationary around a mean of zero for t = 
0,..., 50 and then to fluctuate around a mean of 6 fort = 51,..., 100. The se- 
quence was formed by drawing 100 normally and independently distributed values 


for the {€,} sequence. By setting yọ = 0, the next 100 values in the sequence were 
generated using the formula: 


y, = O.5y,_1 + €, + Dy (4.29) 


where D, is a dummy variable such that D, = 0 fort=1,...,50 and D; =3 for t = 
51,..., 100. The subscript L is designed to indicate that the level of the dummy 
changes. At times, it will be convenient to refer to the value of the dummy variable 
in period ¢ as D,(t); in the example at hand, D,(50) = 0 and D,(51) = 3. 

In practice, the structural change may not be as apparent as the break shown in 
the figure. However, the large simulated break is useful for illustrating the problem 
of using a Dickey—Fuller test in such circumstances. The straight line shown in the 
figure highlights the fact that the series appears to have a deterministic trend. In 
fact, the straight line is the best-fitting OLS equation: 


¥,=4 t+ atte, 
In the figure, you can see that the fitted value of ay is negative and the fitted 
value of a2 is positive. The proper way to estimate (4.29) is to fit a simple AR(1) 


model and allow the intercept to change by including the dummy variable D,. 
However, suppose that we unsuspectingly fit the regression equation: 


Ye = Ag t ayy te, (4.30) 


As you can infer from Figure 4.4, the estimated value of a, is necessarily biased 
toward unity. The reason for this upward bias is that the estimated value of a, cap- 


Figure 4.4 Two models of structural change. 


y, = O.5y, -1+ & + Dy 


yy, =Y,-1+ E& + Dp 


(b) 


tures the property that “low” values of y, (i.e., those fluctuating around zero) are 
followed by other low values and “high” values (i.e., those fluctuating around a 
mean of 6) are followed by other high values. For a formal demonstration, note that 
as a, approaches unity, (4.30) approaches a random walk plus drift. We know 
- that the solution to the random walk plus drift model includes a deterministic trend, 
that is, 


t 
Y, = Yo +agt+ >’, 
i=} 


Thus, the misspecified equation (4.30) will tend to mimic the trend line shown in 
Figure 4.4 by biasing a, toward unity. This bias in a, means that the Dickey—Fuller 
test is biased toward accepting the null hypothesis of a unit root, even though the 
series is Stationary within each of the subperiods. 

Of course, a unit root process can exhibit a structural break also. The lower graph 
(b) of Figure 4.4 simulates a random walk process with a structural change occur- 
ring at f= 51. This second simulation used the same 100 realizations for the {e,} 


sequence and set yọ = 2. The 100 realizations of the {y,} sequence were constructed 
as 


Y= Y + €,+ Dp 


where D,(51)=4 and all other values of Dp = 0. 


Here, the subscript P refers to the fact that there is a single pulse in the dummy 
variable. In a unit root process, a single pulse in the dummy will have a permanent 
effect on the level of the {y,} sequence. In t= 51, the pulse in the dummy is equiva- 
lent to an €,,5, shock of four extra units. Hence, the one-time shock to Dp(51) has a 
permanent effect on the mean value of the sequence for t > 51. In the figure, you 
can see that the level of the process takes a discrete jump in ¢ = 51, never exhibiting 
any tendency to return to the prebreak level. 

The bias in the Dickey~Fuller tests was confirmed in a Monte Carlo experiment. 
Perron (1989) generated 10,000 replications of a stationary process like that of 
(4.29). Each replication was formed by drawing 100 normally and independently 
distributed values for the {€,} sequence. For each of the 10,000 replicated series, 
Perron used OLS to estimate a regression in the form of (4.30).® As could be antici- 
pated from our earlier discussion, he found that the estimated values of a, were bi- 
ased toward unity. Moreover, the bias became more pronounced as the magnitude 
of the break increased. 


Testing for Structural Change 


Returning to the two graphs of Figure 4.4, we see that there. may be instances in 
which the unaided eye cannot easily detect the difference between the alternative 
types of sequences. One econometric procedure to tests for unit roots in the pres- 
ence of a structural break involves splitting the sample into two parts and using 
Dickey—Fuller tests on each part. The problem with this procedure is that the de- 
grees of freedom for each of the resulting regressions’ are diminished. It is prefer- 
able to have a single test based on the full sample. 

Perron (1989) goes on to develop a formal procedure to test for unit roots in the 
presence of a structural change at time period t = t + 1. Consider the null hypothe- 
sis of a one-time jump in the level of a unit root process against the alternative of a 


a 
one-time change in the intercept of a trend stationary process. Formally, let the null 
and alternative hypotheses be 


Hy: y, =p + yy + LiDp + &, Peon (4.31) 
A y,= aota +WD tE os : (4.32) 


where Dp represents a pulse dummy variable such that Dp = 1 if f= 1+ 1 and 
zero otherwise, and D, represents a level dummy variable such that D, = | 
if t > T and zero otherwise. 


Under the null hypothesis, {y,} is a unit root process with a one-time jump in the 
level of the sequence in period £ = t + 1. Under the alternative hypothesis, {y,} is 
trend stationary with a one-time jump in the intercept. Figure 4.5 can help you to 
visualize the two hypotheses. Simulating (4.31) by setting ay = 1 and using 100 re- 
alizations for the {€,} sequence, the erratic line in Figure 4.5 illustrates the time 
path under the null hypothesis. You can see the one-time jump in the level of the 
process occurring in period 51. Thereafter, the {y,} sequence continues the original 
random walk plus drift process. The alternative hypothesis posits that the {y,} se- 
quence is stationary around the broken trend line. Up to £ = 1, {y,} is stationary 
around a, + at and beginning T + 1, y, is stationary around dg + at + p3. As ilius- 
trated by the broken line, there is a one-time increase in the intercept of the trend if 
W >O. 

The econometric problem is to determine whether an observed series is best 
modeled by (4.31) or (4.32). The implementation of Perron’s (1989) technique is 
straightforward: 


Figure 4.5 : 

Alternative representations of structural change. 
20 T | | i 
15 


10 F 


0 20 40 60 80 100 7 


ere = 


STEP 1: Detrend the data by estimating the alternative hypothesis and calling the 
residuals y,. 
Hence, each value of y, is the residual from the regression y, = ag + azt + 
HDi + Ye. 


STEP 2: Estimate the regression: 


A 


Y,;=4Y 14+ €, 


Under the null hypothesis of a unit root, the theoretical value of a, is 
unity. Perron (1989) shows that when the residuals are identically and 
independently distributed, the distribution of a, depends on the proportion 
of observations occurring prior to the break. Denote this proportion by: 
A= UT 


where T= total number of observations. 


STEP 3: Perform diagnostic checks to determine if the residuals from Step 2 are se- 
rially uncorrelated. If there is serial correlation, use the augmented form of 
the regression: 


k 
$, = aĵ, E F BAS +e, 


i=l 


where y, = is the detrended series. 


STEP 4: Calculate the t-statistic for the null hypothesis a, = 1. This statistic can be 


compared to the critical values calculated by Perron. Perron generated 
5000 series according to H, using values of A ranging from 0 to 1 by 
increments of 0.1. For each value of A, he estimated the regressions ĵ, 
= a,y,., + €, and calculated the sample distribution of a). Naturally, the 
critical values are identical to the Dickey—Fuller statistics when À = 0 and 
A = 1; in effect, there is no structural change unless 0 < A < 1. The maxi- 
mum difference between the two statistics occurs when À = 0.5. For À = 
0.5, the critical value of the t-statistic at the 5% level of significance is 
~3.76 (which is larger in absolute than the corresponding Dickey—Fuller 
statistic of —3.41). If you find a t-statistic greater than the critical value 


calculated by Perron, it is possible to reject the null hypothesis of a unit 
root. 


Of course, it is possible to incorporate Step 1 directly into Steps 2 or 3. To com- 
bine Steps | and 3, simply estimate the equation: 


k 
Y, =4g +a Y, 1 tat +p D; + X B:Ay FE, 


i=l 


— iai 


ae 


es JOM L renas dna wru novun 


The t-statistic for the null a, = 1 can then be compared to the appropriate critical 
value calculated by Perron. In addition, the methodology is quite general in that it 
can also allow for a one-time change in the drift or one-time change in both the 
mean and drift. For example, it is possible to test the null hypothesis of a permanent 
change magnitude of the drift term versus the alternative of a change in the slope of 
the trend. Here, the null hypothesis is i 


Ay y, = ao + Y1 + UD; + €, 


where D, = | if t > T and zero otherwise. With this specification, the {y,} sequence 
is generated by Ay, = dy + €, up to period T and Ay, = (dp + H3) + €, thereafter. If p, > 
0, the slope coefficient of the deterministic trend increases for t > t. Similarly, a 
slowdown in trend growth occurs if u, < 0. 

The alternative hypothesis posits a trend stationary series with a change in the 
slope of the trend for ¢ > t: 


Ao: Y, = do + yf + UDr + €, 


where D; = t — t for t > t and zero otherwise. For example, suppose that the break 
occurs in period 51 so that tT = 50. Thus, D;(1) through D,{50) are all zero, so that 
for the first 50 periods, {y,} evolves as y, = dp + aot + €,. Beginning with period 51, 
D(51);= 1, D(52)r = 2,..., so that for £ > T, {y,} evolves as y, = ag + (a, + p,)t + 
e, Hence, Dy changes the slope of the deterministic trend line. The slope of the 
trend is a, fort < T and a, + H; for t> T. 

To be even more general, it is possible to combine the two null hypotheses H, 
and H,. A change in both the level and drift of a unit root process can be repre- 
sented by 


H3: Y, = Ag + Ymi + HiDe + hD: + €, 


where Dp and D, = the pulse and level dummies defined above 


The appropriate alternative for this case is 
A3: Y= ao + aat + MD; + UDr+te, 


Again, the procedure entails estimating the regression A, or A3. Next, using the 
residuals y,, estimate the regression: 


Y5 4Y + €, 


If the errors from this second regression equation do not appear to be white- 
noise, estimate the equation in the form of an augmented Dickey~Fuller test. The 
t-statistic for the null hypothesis a, = 1 can be compared to the critical values calcu- 
lated by Perron (1989). For À = 0.5, Perron reports the critical value of the t-statistic 
at the 5% significance level to be —3.96 for H, and —4.24 for H}. 


Perron’s Test for Structural Change 


Perron (1989) used his analysis of structural change to challenge the findings of 
Nelson and Plosser (1982). With the very same variables used, his results indicate 
that most macroeconomic variables are not characterized by unit root processes. 
Instead, the variables appear to be TS processes coupled with structural breaks. 
According to Perron (1989), the stock market crash of 1929 and dramatic oil price 
increase of 1973 were exogenous shocks having permanent effects on the mean of 
most macroeconomic variables. The crash induced a one-time fall in the mean. 
Otherwise, macroeconomic variables appear to be trend stationary. 

All variables in Perron’s study (except real wages, stock prices, and the station- 
ary unemployment rate) appeared to have a trend with a constant slope and exhib- 
ited a major change in the level around 1929. In order to entertain various hypothe- 
ses concerning the effects of the stock market crash, consider the regression 
equation: 


k 
Yı = dq t+ By Di + by Dp tant t+ ay +Y BAY, +e, 
i=] 


where D,(1930) 
D, 


1 and zero otherwise 
1 for all ¢ beginning in 1930 and zero otherwise 


Under the presumption of a one-time change in the mean of a unit root process, 
a, = l, a, =0, and ų = 0. Under the alternative hypothesis of a permanent one-time 
break in the trend stationary model, a, < 1 and u, = 0. Perron’s (1989) results using 
real GNP, nominal GNP, and industrial production are reported in Table 4.5. Given 
the length of each series, the 1929 crash means that A is 1/3 for both real and nomi- 
nal GNP and equal to 2/3 for industrial production. Lag lengths (i.e., the values of 
k) were determined using t-tests on the coefficients B;. The value k was selected if 
the t-statistic on B, was greater than 1.60 in absolute value and the t-statistic on B, 
for i> k was less than 1.60. 

First, consider the results for real GNP. When we examine the last column of the 
table, it is clear that there is little support for the unit root hypothesis; the estimated 
value of a, = 0.282 is significantly different from unity at the 1% level. Instead, 
real GNP appears to have a deterministic trend (a, is estimated to be over five stan- 
dard deviations from zero). Also note that the point estimate p, = —0.189 is signifi- 
cantly different from zero at conventional levels. Thus, the stock market crash is es- 
timated to have induced a permanent one-time decline in the intercept of real GNP. 

These findings receive additional support since the estimated coefficients and 
their f-statistics are quite similar across the three equations. All values of a, are 
about five standard deviations from unity, whereas the coefficients of the determin- 
istic trends (a) are all over five standard deviations from zero. Since all estimated 
values of u, are significant at the 1% level and negative, the data seem to support 
the contention that real macroeconomic variables are TS, except for a structural 
break resulting from the stock market crash. 


icone AA Bi AE RD inh WR DD ONDA NIBH ESE INN oh 


, ae E creer ee eons 


"Table 4.5 Retesting Nelson and Plosser’s Data for Structural Change 


T À k ao k lh a, a, 


Real GNP 62 0.33 8 3.44 —0.189 —0.018 0.027 0.282 
(5.07) (-4.28) (-0.30) (5.05) (-5.03) 


Nominal 62 0.33 8 5.69 3.60 0.100 0.036 0.471 
GNP (5.44) (4.77) (1.09) (544) (5.42) 
Industrial 111 0.66 8 0.120 —0.298 -0.095 0.032 0.322 
production (4.37) (—4.56) (—.095) (5.42)  (-5.47) 


Notes: 1.T = number of observations 
4 = proposition of observations occurring before the structural change 


k = lag length 
2. The appropriate f-statistics are in parentheses. For a, M, H2, and an, the null is that the coef- 
ficient is equal to zero. For a,, the null hypothesis is a, = 1. Note that all estimated values of 


a, are significantly different from unity at the 1% level. 


Tests with Simulated Data 
To further illustrate the procedure, 100 random numbers were drawn to represent 


the {€,} sequence. By setting yọ = 0, the next 100 values in the {y,} sequence were 
drawn as 


y, = O0.5y,_, + €,+ Dy 


where D, 
D, 


Ofort=1,...,50 
l fort=51,..., 100 


Thus, the simulation is identical to (4.29), except that the magnitude of the struc- 
tural break is diminished. This simulated series is on the data file labeled 
BREAK.PRN; you should try to reproduce the following results. If you were to plot 
the data, you would see the same pattern as in Figure 4.4. However, if you did not 
plot the data or were otherwise unaware of the break, you might easily conclude 
that the {y,} sequence has a unit root. The ACF of the {y,} sequence suggests a unit 
root process; for example, the first six autocorrelations are 


Lag: ] 2 oe 5 6 
0.94 0.88 0.84 0.81 0.77 0.72 


and the ACF of the first differences is: 


Lag: l 2 3 4 5 6 
—0.002 -0.201 -0.112 0.079 -0.010 -0.061 


Dickey—Fuller tests yield 


Ay, = —0.0233y,_; + €n t-statistic for y = 0: -0.98495 
Ay, = 0.0661 — 0.0566y,_, + €n t-statistic for y = 0: -1.70630 
Ay, = -0.0488 — 0.1522y,_, + 0.004r + €, t-statistic for y = 0: -2.73397 


Diagnostic tests indicate that longer lags are not needed. Regardless of the pres- 
ence of the constant or the trend, the {y,} sequence appears to be difference station- 
ary. Of course, the problem is that the structural break biases the data toward sug- 
gesting a unit root. 

Now, with the Perron procedure, the first step is to estimate the model y, = ay + 
aot + 4D, + y,. The residuals from this equation are the detrended {f,} sequence. 
The second step is to test for a unit root in the residuals by estimating 9, = a, 9, + 
€,. The resulting regression is: 


$, = 0.4843 fı + €, 


In the third step, all the diagnostic statistics indicate that {€,} approximates a 
white-noise process. Finally, the t-statistic for a, = 1 is 5.396. Hence, we can reject 
the null of a unit root and conclude that the simulated data are stationary around a 
breakpoint at t= 51. 

Some care must be used in using Perron’s procedure since it assumes that the 
date of the structural break is known. In your own work, if the date of the break is 
uncertain, you should consult Perron and Vogelsang (1992). In fact, entire issue of 
the July 1992 Journal of Business and Economic Statistics is devoted to break- 
points and unit roots. 


_7. PROBLEMS IN TESTING FOR UNIT ROOTS 


There is a substantial literature concerning the appropriate use of the various 
Dickey—Fuller test statistics. The focus of this ongoing research concerns the power 
of the test and presence of the deterministic regressors in the estimating equations. 
Although many details are beyond the level of this text, it is important to be aware 
of some of the difficulties entailed in testing for the presence of a trend (either de- 
terministic or stochastic) in the data-generating process. 


_ Power 


Formally, the power of a test is equal to the probability of rejecting a false null hy- 
pothesis (i.e., 1 minus the probability of a type II error). Monte Carlo simulations 
have shown that the power of the various Dickey~Fuller and Phillips—Perron tests 
is very low; unit root tests do not have the power to distinguish between a unit root 
and near unit root process. Thus, these tests will too often indicate that a series con- 
tains a unit root. Moreover, they have little power to distinguish between trend sta- 


# onary and drifting processes. In finite samples, any trend stationary process can be 

arbitrarily well approximated by a unit root process, and a unit root process can be 
arbitrarily well approximated by a trend stationary process. These results should not 
be too surprising after examining Figure 4.6. The top graph (a) of the figure shows 
a stationary process and unit root process. So as not to bias the results in any partic- 
ular direction, the simulation uses the same 100 values of {e,} that were used in 
Figure 4.4. Using these 100 realizations of {€,}], we constructed two sequences as: 


y= Llyn 7 O.1y,-2 + & 
z= 1.1z,-; ~ 0.15z,-2 + €, 


The {y,} sequence has a unit root; the roots of the {z,} sequence are 0.9405 and 
0.1595. Although {z,} is stationary, it can be called a near unit root process. If you 
did not know the actual data-generating processes, it would be difficult to tell that 
only {z,} is stationary. 

Similarly, as illustrated in the lower graph (b) of Figure 4.6, it can be quite diffi- 
cult to distinguish between a trend stationary and unit root plus drift process. Still 
using the same 100 values of {€,}, we can construct two other sequences as: 


w,=1 +0.02t +e, 
x, = 0.02 + x, + €,/3 


where X= 1 


Here, the trend and drift terms dominate the time paths of the two sequences. 
Again, it is very difficult to distinguish between the sequences. This is especially 
true since dividing each realization of €, by 3 acts to smooth out the {x,} sequence. 
Just as it is difficult for the naked eye to perceive the differences in the sequences, 
it is also difficult for the Dickey—Fuller and Phillips—Perron tests to select the cor- 
rect specification. 

It is easy to show that a trend stationary process can be made to mimic a unit root 
process arbitrarily well. As discussed in Chapter 3, it is possible to write the ran- 
dom walk plus noise model in the form: 


y= +N, 
My = Hr + 


where 1, and e, are both independent white-noise processes with variances of of 
and 67, respectively. Suppose that we can observe the {y,} sequence, but 
cannot directly observe the separate shocks affecting y, If the variance of 
€, is not zero, {y,} is the unit root process: 


t 
Y =Ho + Se +N, one u (4.33) 
i=l ge 


Figure 4.6 


Stationary and unit root processes. 


-2 


0 20 40 60 80 100 
—- Unit root process —- Stationary process 


(a) 


Trend stationary and unit root processes. 


0 l Rotter ett e ae 


0 20 40 60 80 100 
— Random walk plus drift — Trend stationary process 


(b) 


On the other hand, if o° = 0, then all values of {€,} are constant, so that: € = 
€, = + = €. To maintain the same notation as in previous chapters, define this ini- 


tial value of € as do. It follows that p, = Ho + aot, so {y,} is the trend stationary 
process: 


Y= Uo + Aol +N, (4.34) 


Thus, the difference between the difference stationary process of (4.33) and trend 
process of (4.34) concerns the variance of «e, Having observed the composite ef- 
fects of the two shocks—but not the individual components n, and €—we see that 
there is no simple way to determine whether 0° is exactly equal to zero. This is par- 
ticularly true if the data-generating process is such that o2 is large relative to o°. In 
a finite sample, arbitrarily increasing o} will make it virtually impossible to distin- 
guish between a TS and DS series. 

It also follows that a trend stationary process can arbitrarily well approximate a 
unit root process. If the stochastic portion of the trend stationary process has suffi- 
cient variance, it will not be possible to distinguish between the unit root and trend 
stationary hypotheses. For example, the random walk plus drift model (a difference 
stationary process) can be arbitrarily well represented by the model y, = ay + 
a,y,; + €, by increasing o° and allowing a, to get sufficiently close to unity. Both 
these models can be approximated by (4.34). 

Does it matter that is often impossible to distinguish between borderline station- 
ary, trend stationary, and unit root processes? The realistic answer is that it depends 
on the question at hand. In borderline cases, the short-run forecasts from the alter- 
native models may have nearly identical forecasting performance. In fact, Monte 
Carlo studies indicate that when the true data-generating process is stationary but 
has a root close to unity, the one-step ahead forecasts from a differenced model are 
usually superior to the forecasts from a stationary model. However, the long-run 


forecasts of a model with a deterministic trend will be quite different from those of 
the other models.? 


ime, 


Determination of the Deterministic Regressors 


Unless the researcher knows the actual data-generating process, there is a question 
concerning whether it is most appropriate to estimate (4.12), (4.13) or (4.14). It 
might seem reasonable to test the hypothesis y = 0 using the most general of the 
models, that is, 


P 
Ay, =a) t+ Yy,_) + t+ Y PA te Ti (4.35) 
i=2 : 


After all, if the true process is a random walk process, this regression should find 
that a) = y = a, = 0. One problem with this line of reasoning is that the presence of 
the additional estimated parameters reduces degrees of freedom and the power of 


the test. Reduced power means that the researcher may conclude that the process 
contains a unit root when, in fact, none is present. The second problem is that the 
appropriate statistic (i.e., the T, t,, and t,) for testing y = 0 depends on which re- 
gressors are included in the model. As you can see by examining the three Dickey- 
Fuller tables, for a given significance level, the confidence intervals around y= 0 
dramatically expand if a drift and time trend are included in the model. This is quite 
different from the case in which {y,} is stationary. The distribution of the t-statistic 
does not depend on the presence of the other regressors when stationary variables 
are used. 

The point is that it is important to use a regression equation that mimics the ac- 
tual data-generating process. If we inappropriately omit the intercept or time trend, 
the power of the test can go to zero.’ For example, if as in (4.35), the data-generat- 
ing process includes a trend, omitting the term at imparts an upward bias in the es- 
timated value of y. On the other hand, extra regressors increase the absolute value 
of the critical values so that you may fail to reject the null of a unit root. 

To illustrate the problem, suppose that the time series {y,} is assumed to be gen- 
erated by the random walk plus drift process: 


y,=AgtQyiy+€, A #Oanda,=1 : (4.36) 


where the initial condition yọ is given and¢=1,2,..., T. 


If there is no drift, it is inappropriate to include the intercept term since the 
power of the Dickey—Fuller test is reduced. When the drift is actually in the data- 
generating process, omitting 4o from the estimating equation also reduces the power 
of the test in finite samples. How do you know whether to include a drift or time 
trend in performing the tests? The key problem is that the tests for unit roots are 
conditional on the presence of the deterministic regressors and tests for the pres- 
ence of the deterministic regressors are conditional on the presence of a unit root. 

Campbell and Perron (1991) report the following results concerning unit root tests: 


1. When the estimated regression includes at least all the deterministic elements in 
the actual data-generating process, the distribution of y is nonnormal under the 
null of a unit root. The distribution itself varies with the set of parameters in- 
cluded in the estimating equation. 


2. If the estimated regression includes deterministic regressors that are not in the 
actual data-generating process, the power of the unit root test against a station- 
ary alternative decreases as additional deterministic regressors are added. 


3. If the estimated regression omits an important deterministic trending variable 
present in the true data-generating process, such as the expression af in (4.35), 
the power of the t-statistic test goes to zero as the sample size increases. If the 
estimated regression omits a nontrending variable (i.e., the mean or a change in 
the mean), the t-statistic is consistent, but the finite sample power is adversely 
affected and decreases as the magnitude of the coefficient on the omitted com- 
ponent increases. 


ii bah Sa abt a ent ne Be A 


i 


, Estimating (4.13) or (4.14), we observe that the T., Te 0), $2, and 03 statistics 
have the asymptotic distributions tabulated by Dickey and Fuller (1979, 1981). 
The critical values of the various statistics depend on sample size. However, the 
sample variance of {y,} will be dominated by the presence of a trend or drift. 
We saw an example of this phenomenon in Figure 3.12 of Chapter 3. The time 
path of the random walk plus drift model in graph (b) is swamped by the pres- 
ence of the drift term. The fact that the stochastic trend is precisely the same as 
in graph (a) has little effect on the overall appearance of the series. Although the 
proof is beyond the scope of this text, the t, and 1, statistics converge to the 
standardized normal. Specifically, 


T 
yy => a; T?/20 


t=1 


ifa, #0 


= 45 77/3 if dy #0 and a, =0 

Only when both ay and a, equal zero in the regression equation and data-gen- 
erating process do the nonstandard Dickey—Fuller distributions dominate. If the 
data-generating process is known to contain a trend or drift, the null hypothesis 
y= 0 can be tested using the standardized normal distribution. 


The direct implication of these four findings is that the researcher may fail to re- 
ject the null hypothesis of a unit root because of a misspecification concerning the 
deterministic part of the regression. Too few or too many regressors may cause a 
failure of the test to reject the null of a unit root. Although we can never be sure 
that we are including the appropriate deterministic regressors in our econometric 
model, there are some useful guidelines. Doldado, Jenkinson, and Sosvilla-Rivero 
(1990) suggest the following procedure to test for a unit root when the form of the 
data-generating process is unknown. The following is a straightforward modifica- 
tion of their method: 


STEP 1: As shown in Figure 4.7, start with the least restrictive of the plausible 
models (which will generally include a trend and drift) and use the T, sta- 
tistic to test the null hypothesis y = 0. Unit root tests have low power to re- 
ject the null hypothesis; hence, if the null hypothesis of a unit root is re- 
jected, there is no need to proceed. Conclude that the {y,} sequence does 
not contain a unit root. 


STEP 2: If the null hypothesis is not rejected, it is necessary to determine whether 
too many deterministic regressors were included in Step 1 above.'' Test 
for the significance of the trend term under the null of a unit root (e.g., use 
the Tg, Statistic to test the significance of a,). You should try to gain addi- 
tional confirmation for this result by testing the hypothesis a, = y = 0 using 
the 9, statistic. If the trend is not significant, proceed to Step 3. Otherwise, 


‘STEP 3: 


if the trend is significant, retest for the presence of a unit root (i.e., y = 0) 
using the standardized normal distribution. After all, if a trend is inappro- 
priately included in the estimating equation, the limiting distribution of a, 
is the standardized normal. If the null of a unit root is rejected, proceed no 
further; conclude that the {y,} sequence does not contain a unit root. 
Otherwise, conclude that the {y,} sequence contains a unit root. 


Estimate (4.35) without the trend [i.e., estimate a model in the form of 
(4.13)]. Test for the presence of a unit root using the q, statistic. If the null 
is rejected, conclude that the model does not contain a unit root. If the null 
hypothesis of a unit root is not rejected, test for the significance of the 
constant (e.g., use the T,,, Statistic to test the significance of dy given Y= 
0). Additional confirmation of this result can be obtained by testing the 
hypothesis ay = y= 0 using the 9, statistic. If the drift is not significant, es- 
timate an equation in the form of (4.12) and proceed to Step 4. If the drift 
is significant, test for the presence of a unit root using the standardized 


Figure 4.7 A procedure to test for unit roots. 
Estimate Ay,=a,+ Yy,-1 + dot +ZBAy, 7+ & 


STOP: Conclude 
no unit root. 


Yes: Test for the presence 
of the trend. 


Is y= 0 using 
normal 
distribution? 


Conclude {y,} has 
a unit root. 


Estimate 
Ay, = aot Yy,-1 + ZB; Ay, 1 + & 
Isy = 0? 


STOP: Conclude 
no unit root. 


Yes: Test for the presence 
of the drift. 


Isy = 0 using 
normal 
distribution? 


Conclude {y,} has 
a unit root. 


Conclude 
no unit root. 


Estimate 


Ay, = -1 + DB; Ay, _ 1+ € 
y5 Y eer mu Conclude {y,} has 


a unit root. 


normal. If the null hypothesis of a unit root is rejected, conclude that the where LGDP, = log(GDP,), so that ALGDP, is the growth rate of real GDP, and 


{y,} sequence does not contain a unit root. Otherwise, conclude that the standard errors are in parentheses. 
{y,} sequence contains a unit root. The model is well estimated in that the residuals appear to be white-noise and all 
coefficients are of high quality. For our purposes, the interesting point is that the 
Estimate (4.35) without the trend or drift, that is, estimate a model in the „~ Alog(GDP,) series appears to be a stationary process. Integrating suggests that 
form of (4.12). Use q to test for the presence of a unit root. If the null hy- cosy,  l0g(GDP,) has a stochastic and deterministic trend. The deterministic quarterly 
pothesis of a unit root is rejected, conclude that the {y,} sequence does not au growth rate of 0.007018—close to a 3% annual rate—appears to be quite reason- 
contain a unit root. Otherwise, conclude that the {y,} sequence contains a ae able. Now consider the three augmented Dickey—Fuller equations with t-statistics in 
unit root. _.,.. parentheses: 
Remember, this procedure is not designed to be applied in a completely mechan- at 
ical fashion. Plotting the data is usually an important indicator of the presence of san ALGDP, = 0.79018 ~ 0.05409LGDP,_, + 0.000348¢ 
deterministic regressors. The data shown in Figure 4.1 could hardly be said to con- ae (2.56548) (2.54309) (2.27941) 
tain a deterministic trend. Moreover, theoretical considerations might suggest the EPR e + 0.24961ALGDP,_; + 0.172734LGDP,.. (4.37) 
appropriate regressors. The efficient market hypothesis is inconsistent with the (2.83349) es (1.94841) 
presence of a deterministic trend in an asset market price. However, the procedure RSS = 0,0089460783 
is a sensible way to test for unit roots when the form of the data-generating process Li OAR Laat ee She 
is completely unknown. ALGDP, = 0.09600 — 0.00611LGDP,_, + 0.23613ALGDP,_, 
(2.05219) (-1.96196) (2.64113) 
GDP and Unit Roots eer, +0.13535ALGDP,, (4.38) 
Although the methodology outlined in Figure 4.7 can be very useful, it does have Me (1.52736) 
its problems. Each step in the procedure involves a test that is conditioned on all oi RSS = 0.0093334206 
the previous tests being correct; the significance level of each of the cascading tests eae 
is impossible to ascertain. a ALGDP, = 0.000279LGDP,_, + 0.2633 LALGDP,_, + 0.15443ALGDP,_, (4.39) 
The procedure and its inherent dangers are nicely illustrated by trying to deter- pee (3.82135) (2.93959) (1.72964) 
mine if real gross domestic product (GDP) has a unit root. The data are contained in RSS = 0.0096582756 


the file entitled US.WK1 on the data disk; it is a good idea to replicate the results 
reported below. If we use quarterly data over the 1960:1 to 1991:4 period, the cor- 
relogram of the logarithm of real GDP exhibits slow decay. However, the first 12 
autocorrelations and partial autocorrelations of the logarithmic first difference are 


From (4.37), the t-statistic for the null hypothesis y = 0 is —-2.54309. Critical val- 
ues with 125 usable observations are not reported in the Dickey—Fuller table.'* 
However, with 100 observations, the critical value of t, at the 5% significance level 
is ~3.45; hence, it is not possible to reject the null hypothesis of a unit root given 
the presence of the drift term and time trend. 

The power of the test may have been reduced due to the presence of an unneces- 


ACF of the logarithmic first difference of real GDP: i 
Lag 1: 0.3093189 0.2316683 0.0572363 0.0556556 -0.0604932 0.0336679 


p a i aA aA a ENN 


f 7: —0.0476200 —0.1453376 —0.0461222 0.0600729 0.0101171 —0.1695323 sary time trend and/or drift term. In Step 2, you test for the presence of the time 
: trend given the presence of a unit root. In (4.37), the t-statistic for the null hypothe- 
k PACF of the logarithmic first difference of real GDP: sis that a, = 0 is 2.27941. Do not let this large value fool you into thinking that a, is 
Lag 1:  0.3093189 0.1503780 —0.0567524 0.0220048 ~0.0876589 0.0696282 ` significantly different from zero. Remember, in the presence of a unit root, you 
| 7: -0.0507211 —0.1605942 0.0669240 0.1200468 -0.0353431 -0.2423071 i cannot use the critical values of a t-table; instead, the appropriate critical values are 
1 ; given by the Dickey—Fuller Tg, statistic. As you can see in Table 4.1, the critical 
| Despite the somewhat large partial correlation at lag 12, the Box—Jenkins proce- value of tg, at the 5% significance level is 2.79; hence, it is reasonable to conclude 
( dure yields the ARIMA(O, 1, 2) model: that a, =0. The 3 Statistic to test the joint hypothesis a, = y=0 reconfirms this re- 
| sult. If we view (4.37) as the unrestricted model and (4.39) as the restricted model, 
i ALGDP, = 0.007018 + (1 + 0.262169L + 0.197547L%)e, ` there are two restrictions and 120 degrees of freedom in the unrestricted model; the 
(0.001144) (0.088250) (0.082663) *: Q; Statistic is 


AES wets 


. ae d, = ((0.0096582756 — 0.0089460783)/2} /(0.0089460783/120) 


= 4.7766 


Since the critical value of 6, is 6.49, it is possible to conclude that the restriction 
a, = Y= 0 is not binding. Thus, proceed to Step 3 where you estimate the model 
without the trend. In (4.38), the t-statistic for the null hypothesis y = 0 is —1.96196. 
Since the critical value of the T, statistic is -2.89 at the 5% significance level, the 
null hypothesis of a unit root is not rejected at conventional significance levels. 
Again, the power of this test will have been reduced if the drift term does not be- 
long in the model. To test for the presence of the drift, use the T,,, statistic. The cal- 
culated t-statistic is 2.05219, whereas the critical value at the 5% significance level 


_ is 2.54. The 6, statistic also suggests that the drift term is zero. Comparing (4.38) 


and (4.39), we obtain 


©, = (0.0096582756 — 0.0093334206)/(0.0093334206/121) 
= 4,.21147365 


Proceeding to Step 4 yields (4.39). The point is that the procedure has worked it- 
self into an uncomfortable corner. The problem is that the positive coefficient for y 
(i.e, the estimated value of y = 0.000279 is almost four standard deviations from 
zero) suggests an explosive process. In Step 3, it was probably unwise to conclude 
that the drift term is equal to zero. As you should verify in Exercise 4 at the end of 
this chapter, the simple Box—Jenkins ARIMA(Q, 1, 2) model with an intercept of 
0.007018 performs better than any of the alternatives. 


SUMMARY AND CONCLUSIONS 


In finite samples, the correlogram of a unit root process will decay slowly. As such, 
a slowly decaying ACF can be indicative of a unit root or near unit root process. 
The issue is especially important since many economic time series appear to have a 
nonstationary component. When you encounter such a time series, do you detrend, 
do you first-difference, or do you do nothing since the series might be stationary? 
Adherents of the Box~Jenkins methodology recommend differencing a nonsta- 
tionary variable or variable with a near unit root. For very short-term forecasts, the 
form of the trend is nonessential. Differencing also reveals the pattern of the other 
autoregressive and moving average coefficients. However, as the forecast horizon 
expands, the precise form of the trend becomes increasingly important. Stationarity 


_implies the absence of a trend and long-run mean reversion. A deterministic trend 


implies steady increases (or decreases) into the infinite future. Forecasts of a series 
with a stochastic trend converge to a steady level. As illustrated by the distinction 


between real business cycles and the more traditional formulations, the nature of 
the trend may have important theoretical implications. 

The usual f-statistics and F-statistics are not applicable to determine whether or 
not a sequence has a unit root. Dickey and Fuller (1979, 1981) provide the appro- 
priate test statistics to determine whether a series contains a unit root, unit root plus 
drift, and/or unit root plus drift plus a time trend. The tests can also be modified to 
account for seasonal unit roots. If the residuals of a unit root process are heteroge- 
neous or weakly dependent, the alternative Phillips~Perron test can be used. 

Structural breaks will bias the Dickey—Fuller and Phillips—Perron tests toward 
the nonrejection of a unit root. Perron (1989) shows how it is possible to incorpo- 
rate a known structural change into the tests for unit roots. Caution needs to be ex- 
ercised since it is always possible to argue that structural change has occurred; each 
year has something different about it than the previous year. In an interesting exten- 
sion, Perron and Vogelsang (1992) show how to test for a unit root when the pre- 
cise date of the structural break is unknown. 

All the aforementioned tests have very low power to distinguish between a unit 
root and near unit root process. A trend stationary process, can be arbitrarily well 
approximated by a unit root process, and a unit root process can be arbitrarily well 
approximated by a trend stationary process. Moreover, the testing procedure is con- 
founded by the presence of the deterministic regressors (i.e., the intercept and de- 
terministic trend). Too many or too few regressors reduce the power of the tests. 

An alternative is to take a Bayesian approach and avoid specific hypothesis test- 
ing altogether. West and Harrison (1989) provide an accessible introduction to 
Bayesian analysis in the context of regression analysis. Zellner (1988) discusses 
some of the philosophical underpinnings of the approach and Leamer (1986) pro- 
vides a straightforward application to estimating the determinants of inflation. Sims 
(1988) is the standard reference for the Bayesian approach to unit roots. 


QUESTIONS AND EXERCISES 


1. The columns in the file labeled REAL.PRN contain the logarithm of the real ex- 
change rates for Canada, Japan, Germany, and the U.K. The four series are 
called RCAN, RGER, RJAP, and RUK, respectively. As in Section 4, each se- 
ries is constructed as r, = e, + p* — p, 


where r = log of the real exchange rate 

log of the dollar price of foreign exchange 
p* = log of the foreign wholesale price index 

p log of the U.S. wholesale price index 


a 
i 


li 


All series run from February 1973 through December 1989, and each is ex- 
pressed as an index number such that February 1973 = 1.00. 


You should find that the data have the following properties: 


Observa- Standard 
Series tions Mean Error Minimum Maximum 
RCAN 203 0.93041911330 0,05685010789 0.83472000000 1.03930000000 
RGER 203 1.07711822660 0.15732887872 0.64541000000 —1.34009000000 
RJAP 203 1.16689172414 0.13981473422 0.91620000000 1.50787000000 
RUK 203 1.09026873892 0.14524762980 0.70991900000  1.38482000000 


Faki 


pes 
keri 


A. For each sequence, find the ACF and PACF of (i) the level of the real ex- 
change rate; (ii) the first difference of the real exchange rate; and (iii) the de- 
trended real exchange rate. For example, for Canada you should find 


ACF: 
1: 0.95109959 0.91691527 0.89743824 0.86897993 0.84708012 0.81911904 
7:  0.79706303 0.77888188 0.75410092 0.72946966 0.70020306 0.65782904 


ACF of the first difference: 


1: +0.1562001 —0.1531103  0.0443029 —0.0152957 
7: —0.0475489 0.0597755 -0.0255490 0.0142241 


0.1053500 —0.0740475 
0.1810469 —0.1151413 


B. Explain why it is not possible to determine whether the seqence is stationary 
or nonstationary by the simple examination of the ACF and PACF. 


C. Including a constant, use Dickey—Fuller and augmented Dickey—Fuller tests 
(with 12 monthly lags) to test whether the series are unit root processes. You 
should find that the t-statistics for y= 0 are 


Series No lags 12 lags Trend + 12 lags 
RCAN ~1.81305 —1.50810 —0.85650 
RJAP aa —1.81978 © -2.30579 —2.61854 
RGER —1.64297 -2.10719 -2.09955 
RUK -1.55877 -2.51668 —2.57493 


. The last entry in the table means that y is more than 2.57 standard deviations 


from zero. A student’s t-table indicates that at the 95% significance level, the 
critical value is about 1.96 standard deviations. Why is it incorrect to conclude 
that the null hypothesis of a unit root can be rejected since the calculated t-sta- 
tistic is more than 1.96 standard deviations from zero? 


i. For each entry reported in the table, what are the appropriate statistics to use (T, 


Ty Or T,) in order to test the null hypothesis of a unit root? 


NO 


D. If your software package can perform Phillips~Perron tests, reestimate part C 


using the Phillips—Perron rather than Dickey—Fuller procedure. You should 


find that the t-statistics for y= 0 are 


Series 


No lags 12 lags : Trend + 12 lags 
RCAN =. —1.82209 —1.60022 ~1.10882 
RJAP ~1.82886 -2.03795 ~2.19736 
RGER - ~1.65117 -1.85319 -1.88371 
RUK - ~1.56654 ~1.81530 ~2.01424 


E. Why do you suppose the results from parts C and D are so similar? 


. Determine whether an intercept term belongs in the regression equations. 
Determine whether the time trend should be included in the equations. 
Determine whether the intercept and time trend belong in the equations. 


G. Use the Japanese data to show that you can reject the null hypothesis of two 
unit roots. 


. The second column in the file labeled BREAK.PRN contains the simulated data 


used in Section 6. You should find: 


Observa- Standard 
Series tions Mean Error Minimum Maximum 
YI 100 0.98802 0.99373 —0.78719 2.654697 


A. Plot the data to see if you can recognize the effects of the structural break. 


B. Verify the results reported in Section 6. 


. The third column in the file labeled BREAK.PRN contains another simulated 


data set with a structural break at t= 51. You should find 


Observa- Standard a 
Series tions Mean Error Minimum Maximum . 
Y2 100 2.21080 1.7816 —1.3413 §.1217 


A. Plot the data. Compare your graph to those of Figures 4.4 and 4.5. 


B. Obtain the ACF and PACF of the {Y2,} sequence and first difference of the 
sequence. Do the data appear to be difference stationary? 


C. If as in (4.11), a Dickey—Fuller test is performed including a constant and 
trend, you should obtain 


ey E Ree WETERE D aE a tee eee 


Standard | 
Coefficient Estimate Error t-Statistic Significance 
Constant 0.072445666 0.071447971 1.01396 0.31314869 
TREND —0.000101438 0.002 120465 0.04784 0.96194514 
YZ —0.022398360 0.034013944 —0.65851 0.51178974 


i. In what ways is this regression equation inadequate? 
ii. What diagnostic checks would you want to perform? 


D. Estimate the equation Y2, = ay + aot + pD, and save the residuals. You 
should obtain i 


Standard 
Coefficient Estimate Error t-Statistic Significance 
Constant 0.4185991020 0.1752103414 2.38912 0.01882282 
DUMMY 2.8092054550 0.3097034669 9.07063 0.00000000 
TREND 0.0076752509 0.0053644896 1.43075 0.15571516 


E. Perform a Dickey-—Fuller test on the saved residuals. You should find f, = 
0.965247 1y,_, + €,, where the standard error of a, = 0.0372. Also perform the 
appropriate diagnostic tests on this regression to ensure that the residuals ap- 
proximate white noise. You should conclude that the series is a unit root 
process with a one-time pulse at t= 51. 


F. Return to part D but now eliminate the insignificant time trend. How is your 
answer to part E affected? 


. The sixth column in the file labeled US. WK1 contains the real GDP data used in 


Section 7. The quarterly series runs from 1960:1 to 1991:4 and each entry is ex- 
pressed in 1985 dollars. You should find that the properties of the series are such 
that 


Series Name Observations Mean 


GDP85 128 3.220373E+12 


A. Plot the logarithm of real GDP. Do the data suggest any particular form of 
the trend? 


B. Use the Box—Jenkins methodology to verify that an ARIMAQ(O, 1, 2) model 
performs better than an ARIMA(2, 1, 0) model. 


C. Calculate the various Dickey—Fuller statistics reported in Section 7. Are 
there any indications that might be inappropriate to accept the hypothesis 
ag = 0? 


D. Repeat the procedure using the Phillips—Perron tests. 


fap crete tees 


ENDNOTES 


10. 


Issues concerning the possibility of higher-order equations, longer lag lengths, serial 
correlation in the residuals, structural change, and the presence of deterministic compo- 
nents will be considered in due course. 


. The critical values are reported in Table A at the end of this text. 


Suppose that the estimated value of y is -1.9 (so that the estimate of a, is -0.9) with a 
standard error of 0.04. Since the estimated value of yis 2.5 standard errors from —2 [(2 - 
1.9)/0.04 = 2.5], the Dickey~Fuller statistics indicate that we cannot reject the null hy- 
pothesis y = —2 at the 95% significance level. Unless stated otherwise, the discussion in 
the text assumes that a, is positive. Also note that if there is no prior information con- 
cerning the sign of a,, a two-tailed test can be conducted. 

Here we use the notation e,, rather than €,, to highlight that the residuals from such a re- 
gression will generally not be white-noise. 


. For the same reason, it is also inappropriate to use one variable that is trend stationary 


and another that is difference stationary. In such instances, “time” can be included as a 
so-called explanatory variable or the variable in question can be detrended. 


. Tests using lagged changes in the {Ay,} sequence are called augmented Dickey—Fuller 


tests. 


. In their simulations, Dickey and Fuller (1981) found that 90% of the calculated Q, statis- 


tics were 5.47 or less and 95% were 6.49 or less when the actual data were generated ac- 
cording to the null hypothesis. 


. Perron’s Monte Carlo study allows for a drift and deterministic trend. Nonetheless, the 


value of a, is biased toward unity in the presence of the deterministic trend. 


. Moreover, Evans and Savin (1981) find that for an AR(1) model, the limiting distribu- 


tion of the autoregressive parameter has a normal asymptotic distribution (for p < 1). 
However, when the parameter is near 1, the unit root distribution is a better finite sample 
approximation than the asymptotic correct distribution. 

Campbell and Perron (1991) report that omitting a variable that is growing at least as 
fast as any other of the appropriately included regressors causes the power of the tests to 
approach zero. 


. Using the most general model in Step į is meant to address the problem of omitting im- 


portant deterministic regressors. 


. The sample period 1960:1 to 1991:4 contains 128 total observations. Three observations 


are lost by creating the two lagged changes. 


APPENDIX: Phillips~Perron Test Statistics 


Suppose that we observe observations 1, 2,..., T of the {y,} sequence and esti- 


mate the regression equation: 


f 


Y, = Gg + Gy,) + & (t — T/2) + u, 


In this appendix, we modify our notation slightly for those wishing to read the 


work of Phillips and Perron. Fortunately, the changes are minor; simply replace ão 
with u, @, with œ, and ã, with B. Thus, suppose we have estimated the regression: 


Chapter 5 


‘MULTIEQUATION | 
TIME-SERIES MODELS 


As we have seen in previous chapters, you can capture many interesting dynamic 
relationships using single-equation time-series methods. In the recent past, many 
time-series texts would end with nothing more than a brief discussion of multi- 
equation models. However, one of the most fertile areas of contemporary time- 
series research concerns multiequation models. The specific aims of this chapter are 
to: 


1. Introduce intervention analysis and transfer function analysis. These two 
techniques generalize the univariate methodology by allowing the time path of 
the “dependent” variable to be influenced by the time path of an “independent” 
or “exogenous” variable. If it is known that there is no feedback, intervention 
and transfer function analysis can be very effective tools for forecasting and hy- 
pothesis testing. 


2. Introduce the concept of a vector autoregression (VAR). The major limitation of 
intervention and transfer function models is that many economic systems do ex- 
hibit feedback. In practice, it is not always known if the time path of a series 
designated to be the “independent” variable has been unaffected by the time 
path of the so-called “dependent” variable. The most basic form of a VAR treats 
all variables symmetrically without making reference to the issue of dependence 
versus independence. 


3. The tools employed by VAR analysis—-Granger causality, impulse response 
analysis, and variance decompositions—can be helpful in understanding the in- 
terrelationships among economic variables and in the formulation of a more 
structured economic model. These tools are illustrated using examples concern- 
ing the fight against transnational terrorism. 


4. Develop two new techniques, structural VARs and multivariate decomposi- 
tions. that blend economic theory and multiple time-series analysis. Economic 


; 
4 
X 


270 Multiequation Time-Series Models 


theories contain behavioral, structural, and/or reduced-form relationships that can 
be incorporated into a VAR analysis. In a structural VAR, the restrictions of a par- 
ticular economic model are imposed on the contemporaneous relationship among 
the variables. The dynamic response of each variable to various economic shocks 
can be obtained and the restrictions of the model tested. Similarly, long run neutral- 
ity restrictions can aid in decomposing a series into its temporary and permanent 
components. As opposed to the class of univariate decompositions considered in 
Chapter 3, decompositions in a VAR framework can be exactly identified. 


1. INTERVENTION ANALYSIS 


Beginning in the late 1960s, the international community experienced a serious 
threat from transnational terrorism. Terrorists engage in a wide variety of opera- 
tions including assassinations, armed attacks, bombings, kidnappings, and skyjack- 
ings. Such incidents are particularly heinous since they are often directed at inno- 
cent victims who are not part of the decision-making apparatus that the terrorists 
seek to influence. Although the downing of Pan Am flight 103 over Lockerbie, 
Scotland on December 21, 1988 captured the attention of the international commu- 
nity, skyjacking incidents are actually quite numerous. 

A critical response to the rise in skyjackings occurred when the United States be- 
gan to install metal detectors in all U.S. airports in January 1973. Other interna- 
tional authorities followed shortly. The summation of all transnational plus U.S. 
domestic skyjackings is shown in Figure 5.1. Although the number of skyjacking 
incidents appears to take a sizable and permanent decline at this date, we might be 
interested in actually measuring the effects of installing the metal detectors. If {y,} 
represents the quarterly total of skyjackings, one might try to take the mean value 
of {y,} for all t < 1973:1 and compare it to the mean value of {y,} for all t > 1973:1. 
However, such a test is probably inappropriate in time-series analysis. Since suc- 
cessive values of y, are serially correlated, some of the effects of the premetal de- 
tector regime could “carry over” to the postintervention date. For example, some 
planned skyjacking incidents already in the pipeline might not be deterred as read- 
ily as others. 

Intervention analysis allows for a formal test of a change in the mean of a time 
` series. Consider the model used in Enders, Sandler, and Cauley (1990) to study the 
impact of the metal detector technology on the number of skyjacking incidents: 


Vr = Ay +A, Yii + Coz, + En la, | <1 (5.1) 
where z, = the intervention (or dummy) variable that takes on the value of zero 


prior to 1973:1 and unity beginning in 1973:1 
is a white-noise disturbance 


M 
it 


To explain the nature of the model, notice that for t < 1973:1, the value z, is 
zero.! AS such, the intercent term is a. and the lane-rin mean af tha cacian n f 


Intervention Analysis 271 


Figure 5.1 Skyjackings. 
40 


TYTTTT 


30 F 


20 = 


10 


Incidents per quarter 


01968 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 


(1 — a,). Beginning in 1973, the intercept term jumps to ay + Co (since 21973: jumps 
to unity). Thus, the initial or impact effect of the metal detectors is given by the 
magnitude of co. The statistical significance of cy can be tested using a standard 
t-test. We would conclude that metal detectors reduced the number of skyjacking 
incidents if cg is negative and statistically different from zero. . 

The long-run effect of the intervention, given by co/(1 ~ a), is equal to the new 
long-run mean (ao + Co)/(1 — a,) minus the value of the original mean a(l - a,). 
The various transitional effects can be obtained from the impulse response function. 
Using lag operators, rewrite (5.1) as 


(1 — a,L)y, = ag + Coz, + €, 


so that 
. Y, = Ay /(L—a,) +c Dalz; + Zaeri (5.2) 
' i=0 i= 


Equation (5.2) is an impulse response function; the interesting twist added by the 
intervention variable is that we can obtain the responses of the {y,} sequence to the 
interventions. To trace out the effects of metal detectors on skyjackings, suppose 
that f= 1973:1 (so that £ + 1 = 1973:2, t + 2 = 1973:3, etc.). For time period t, the 
impact of z, on y, is given by the magnitude of the coefficient cy. The simplest way 
to derive the remaining impulse responses is to recognize that (1) dy,/dz,_; = 
dy,,;/dz, and (2) za: = z, = 1 for all i> 0. . 

Hence, differentiate (5.2) with respect to z, and update by one period, so that 


Y,4,/dz, = Co + CoQ 


The presence of the term cg reflects the direct impact of z,,, on y,,;, and the sec- 


A A + +t er. E a Ee oe EETA E AE EEE E PEE S E a E Y A E 
nnd tases > nalanta tha affan 


272 Muttiequation Time-Series Models 


Yi41 (= @,). Continuing in this fashion, we can trace out the entire impulse (or im- 
pact) response function as Da 


dy,,,/dz, = Coll Fay tent (a,)’] 


SINCE 2,4) =Z.2 =e h 

Taking limits as j — œ, we can reaffirm that the long-run impact is given by 
Co/(1 — a). If it is assumed that 0 < a, < 1, the absolute value of the magnitude of 
the impacts is an increasing function of j. As we move further away from the date 
in which the policy was introduced, the greater the absolute value of the magnitude 
of the policy response. If —1 < a, < 0, the policy has a damped oscillating effect on 
the {y,} sequence. After the initial jump of co, the successive values of {y,} oscillate 
above and below the long-run level of cg/(1 — a,). 

There are several important extensions to the intervention example provided 
here. Of course, the model need not be a first-order autoregressive process. A more 
general ARMA(p, q) intervention model has the form: 


Yı = ap T A(L)y,.4 + Coz, + B(L)e, 


where A(L) and B(L) = polynomials in the lag operator L 


Also, the intervention need not be the pure jump illustrated in the upper-left-hand 
graph (a) of Figure 5.2. In our study, the value of the intervention sequence jumps 
from zero to unity in 1973:1. However, there are several other possible ways to 
model the intervention function: 


l. Impulse function. As shown in the upper-right-hand graph (b) of the figure, the 
function z, is zero for all periods except in one particular period in which z, is 
unity. This pulse function best characterizes a purely temporary intervention. Of 
course, the effects of the single impulse may last many periods due to the au- 
toregressive nature of the {y,} series. 


2. Gradually changing function. An intervention may not reach its full force imme- 
diately. Although the United States began installing metal detectors in airports 
in January 1973, it took almost a full year for installations to be completed at 
some major international airports. Our intervention study of the impact of metal 
detectors on quarterly skyjackings also modeled the z, series as 1/4 in 1973:1, 
1/2 in 1973:2, 3/4 in 1973:3, and 1.0 in 1973:4 and all subsequent periods. This 
type of intervention function is shown in the lower-left-hand graph (c) of the fig- 
ure. 


3. Prolonged impulse function. Rather than a single pulse, the intervention may re- 
main in place for one or more periods and then begin to decay. For a short time, 
sky marshals were put on many U.S. flights to deter skyjackings. Since the sky 
marshal program was allowed to terminate, the {z,} sequence for sky marshals 
might be represented by the decaying function shown in the lower-right-hand 
graph (d) of Figure 5.2. 


Intervention Analysis 273 


Figure 5.2 Typical intervention functions. 


Pure jump. Impulse function. 
1.2 1.2 

1 = i 4 
08r Ho 08 4 
0.6 H os - 
0.4 m mia 0.4 = 
0.2;- 0.2 a 

saa (a) 5 lb) 


Gradually changing. 


Prolonged impulse: 


Be aware that the effects of these interventions change if {y,} has a unit root. 
From the discussion of Perron (1989) in Chapter 4, you should recall that a pulse 
intervention will have a permanent effect on the level of a unit root process. 
Similarly, if {y,} has a unit root, a pure jump intervention will act as a drift term. 
As indicated in Question 1 at the end of this chapter, an intervention will have a 
temporary effect on a unit root process if all values of {z,} sum to zero (e.g, z, = 1, 
Za) = 0.5, 242 = —0.5, and all other values of the intervention variable equal zero). 

Often, the shape of the intervention function is clear from a priori reasoning. 
When there is an ambiguity, estimate the plausible alternatives and then use the 
standard Box—Jenkins model selection criteria to choose the most appropriate 
model. The following two examples illustrate the general estimation procedure. 


Estimating the Effect of Metal Detectors on Skyjackings 


The linear form of the intervention model y, = ay + A(L)y,_, + Coz, + B(L)e, assumes 
that the coefficients are invariant to the intervention. A useful check of this assump- 
tion is to pretest the data by estimating the most appropriate ARIMA(p, d, q) mod- 
els for both the pre- and postintervention periods. If the two ARIMA models are 
quite different, it is likely that the autoregressive and moving average coefficients 
have changed. Usually, there are not enough pre- and postintervention observations 
to estimate two separate models. In such instances, the researcher must be content 
to proceed using the best-fitting ARIMA model over the longest data span. The 


tai 1 


` 274 


STEP 1: 


STEP 2: 


STEP 3: 


Multiequation Time-Series Models 


Use the longest data span (i.e., either the pre- or postintervention observa- 
tions) to find a plausible set of ARIMA models. 

You should be careful to ensure that the {y,} sequence is Stationary. If 
you suspect nonstationarity, you can perform unit root tests on the longest 
span of data. Alternatively, you can use the Perron (1989) test for struc- 
tural change discussed in Chapter 4. In the presence of d unit roots, esti- 
mate the intervention model using the dth difference of y, (i.e., A“y,). 

In our study, we were interested in the effects of metal detectors on U.S. 
domestic skyjackings, transnational skyjackings (including those involv- 
ing the United States), and all other skyjackings. Call each of these time 
series {DS,}, {TS,}, and {OS,} respectively. Since there are only 5 years 
of data (i.e., 20 observations) for the preintervention period, we estimated 
the best-fitting ARIMA model over the 1973:1 to 1988:4 period. Using the 
various criteria discussed in Chapter 2 (including diagnostic checks of the 
residuals), we selected an AR(1) model for the {TS,} and {OS,} sequences 
and a pure noise model (i.e., all autoregressive and moving average coeffi- 
cients equal to zero) for the {DS,} sequence. 


Estimate the various models over the entire sample period including the 
effect of the intervention. 

The installation of metal detectors was tentatively viewed as an immedi- 
ate and permanent intervention. As such, we set z, = 0 for t < 1973:1 and 
2, = l beginning in 1973:8. The results of the estimations over the entire 
sample period are reported in Table 5.1. As you can see, the installation of 
metal detectors reduced each of the three types of skyjacking incidents. 
The most pronounced effect was on U.S. domestic skyjackings that imme- 
diately fell by over 5.6 incidents per quarter. All effects are immediate 
since the estimate of a, is zero. The situation is somewhat different for the 
{TS,} and {OS,} sequences since the estimated autoregressive coefficients 
are different from zero. On impact, transnational skyjackings and other 
types of skyjacking incidents fell by 1.29 and 3.9 incidents per quarter. 
The long-run effects are estimated to be -1.78 and -5.11 incidents per 
quarter. 


Perform diagnostic checks of the estimated equations. 

Diagnostic checking is particularly important since we have merged the 
observations from the pre- and Postintervention periods. To reiterate the 
discussion of ARIMA models, a well-estimated intervention model will 
have the following characteristics: 


1. The estimated coefficients should be of “high quality.” All coefficients 
should be statistically significant at conventional levels. As in all 
ARIMA modeling, we wish to use a parsimonious model. If any coeffi- 


Intervention Analysis 275 


cient is not significant, an alternative modei should be considered. 
Moreover, the autoregressive coefficients should imply that the {y,} se- 
quence is convergent. 


2. The residuals should approximate white noise. If the residuals are seri- 
ally correlated, the estimated model does not mimic the actual data- 
generating process. Forecasts from the estimated model cannot possibly 
be making use of all available information. If the residuals do not ap- 
proximate a normal distribution, the usual tests of statistical inference 
are not valid. If the errors appear to be ARCH, the entire intervention 
model can be reestimated as an ARCH process. 


3. The tentative model should outperform plausible alternatives. Of 
course, no one model can be expected to dominate all others in all 
possible criteria. However, it is good practice to compare the results 
of the maintained model to those of reasonable rivals. In the skyjack- 
ing example, a plausible alternative was to model the intervention as 
a gradually increasing process. This is particularly true since the im- 
pact effect was immediate for U.S. domestic flights and convergent 
for transnational and other domestic flights. Our conjecture was that 
metal detectors were gradually installed in non-U.S. airports and, 

„= even when installed, the enforcement was sporadic. As a check, we 

modeled the intervention as gradually increasing over the year 1973. 

a Although the coefficients were nearly identical to those reported in 

Table 5.1, the AIC and SBC were slightly lower (indicating a better 

fit) using the gradually increasing process. Hence, it is reasonable to 

conclude that metal detector adoption was more gradual outside of 
-1e the United States. 


Table 5.1 Metal Detectors and Skyjackings 


SL SaaS PUSS PSSRGASESEEESS 


Preintervention Impact Effect Long-Run 
Mean a, (co) Effect 
ee 
Transnational {TS,} 3.032 0.276 ~1.29 -1.78 
(5.96) (2.51) (-2.21) 
U.S. domestic {DS,} _. 6.70 T ~5.62 i -5.62 
-~ (12.02) ` © (~8.73) 
Other skyjackings {OS,} 6.80 0.237 -3.90 -5.11 
“ (7.93) (2.14) (-3.95) 
LS aaa TTS SRS RS 7 USSU Us Sener SENSOERED 


Notes: 
l. t-statistics are in parentheses 


Co 


2. The long-run effect is calculated as 
~a, 


276 Multiequation Time-Series Models 


Estimating the Effect of the Libyan Bombing 


We also considered the effects of the U.S. bombing of Libya on the morning of 
April 15, 1986. The stated reason for the attack was Libya’s alleged involvement in 
the terrorist bombing of the La Belle Discotheque in West Berlin. Since 18 of the 
F-111 fighter-bombers were deployed from British bases at Lakenheath and Upper 
Heyford, England, the U.K. implicitly assisted in the raid. The remaining U.S. 
planes were deployed from aircraft carriers in the Mediterranean Sea. Now let y, 
denote all transnational terrorist incidents directed against the United States and 
U.K. during month t. A plot of the {y,} sequence exhibited a large positive spike 
immediately after the bombing; the immediate effect seemed to be a wave of 
anti-U.S. and anti-U.K. attacks to protest the retaliatory strike. 

Preliminary estimates of the monthly data from January 1968 to March 1986 in- 
dicated that the { y,} sequence could be estimated as a purely autoregressive model 


efficient at lag 5, but both the AIC and SBC indicate that the fifth lag is important. 
Nevertheless, we estimated versions of the model with and without the fifth lag. In 
addition, we considered two possible patterns for the intervention series. For the 
first, {z,} was modeled as zero until April 1986 and 1 in all subsequent periods, 


Using this specification, we obtained the following estimates (with t-statistics in 
parentheses): 


Y= 5.58 + 0.336y,_, + 0.123y, 5 + 2.65z, 
(5.56) (3.26) (0.84) 


AIC = 1656.03, SBC = 1669.95 


Note that the coefficient of z, has a t-statistic of 0.84 (which is not significant at 


the 0.05 level). Alternatively, when z, was allowed to be 1 only in the month of the 
attack, we obtained 


Y: =3.79 + 0.327y,_; +0. 1S57y,_5 + 38.9z, 
(5.53) (2.59) (6.09) 

AIC = 1608.68, SBC = 1626.06 
In comparing the two estimates, it is clear that magnitudes of the autoregressive 
coefficients are similar. Although Q-tests indicated that the residuals from both 
models approximate white noise, the pulse specification is preferable. The coeffi- 
cient on the pulse term is highly significant and both the AIC and SBC select the 
second specification. Our conclusion was that the Libyan bombing did not have the 
desired effect of reducing terrorist attacks against the United States and the U.K. 
Instead, the bombing caused an immediate increase of over 38 attacks. Subsequent- 
ly, the number of attacks declined; 32.7% of these attacks are estimated to persist 
for one period (0.327 x 38.9 = 12.7). Si 


convergence, the long-run consequences of the raid were estimated to be zero. 


Transfer Function Models 277 èx 


2. TRANSFER FUNCTION MODELS 


A natural extension of the intervention model is to allow the {z} sequence to be 
something other than a deterministic dummy variable. Consider the following gen- $ 
eralization of the intervention model: 


Yi = do + ALi + CL)z, + BUE 5) 


where A(L), B(L) and C(L) = polynomials in the lag operator L 


In a typical transfer function analysis, the researcher will collect data a a z 
dogenous variable {y,} and exogenous variable {z,}. The goal is to lee et o 
rameter ag and parameters of the polynomials A(L), B(L), and C(L). e e i 
ference between (5.3) and the intervention model is that {z,} is not ee = 
have a particular deterministic time path. The intervention variable is Aw EN 
any exogenous stochastic process. The polynomial C(L) is called the trans u 
tion in that it shows how a movement in the exogenous variable z, affects t e y 
path of (i.e., is transferred to) the endogenous variable {y,}. The coefficients o 
C(L), denoted by c; are called transfer function weights. The impulse ae 
function showing the effects of a z, shock on the {y,} sequence is given by C(L) 
: It a to note that transfer function analysis assumes that {z,} is an exoge- 
nous process that evolves independently of the {y,} sequence. pe oes 
are assumed to have no effect on the {z,} sequence, so that Ez€,_,=0 for all values 
of s and t. Since z, can be observed and is uncorrelated with the current EN 
in y, (1.e., the disturbance term e,), the current and lagged values : ži ss oe 
tory variables for y,. Let C(L) be cg + c,Lt+ col? ++. If Co = O, the con a 
ous value of z, does not directly affect y, As such, {z, } is called a leading in 
in that the observations z,, Z,_), Z,-2,-.. can be used in predicting future values o 

ence.” 
goo conceptualize numerous applications for (5-3). After aii a a ban 
of dynamic economic analysis concerns the effects of an “exogenous : in sath 
dent” sequence {z,} on the time path of an endogenous sequence {y,}. : ae A 
much of the current research in agricultural economics concerns the effects o z 
macroeconomy on the agricultural sector. If we use (5.3), farm output {y,} is af- 
‘fected by its own past, as well as the current and past state of the Heaney 
{z,}. The effects of macroeconomic fluctuations on farm output can be gat ise 
by the coefficients of C(L). Here, B(L)e, represents the unexplained ia o r 
output. Alternatively, the level of ozone in the atmosphere {y,} is a sae y oo 
ing process; hence, in the absence of other outside influences, we shou oe 
ozone level to be well represented by an ARIMA model. However, many ve ar 
gued that the use of fluorocarbons has damaged the ozone layer. Because a = 
mulative effect, it is argued that current and past values of fluorocarbon ee af- 
fect the value of v.. By letting z, denote fluorocarbon usage in z, it is possible to 


278 Multiequation Time-Series Models 


model the effects of the fluorcarbon usage on the ozone layer using a model in the 
form of (5.3). The natural dissipation of ozone is captured through the coefficients 
of A(L). Stochastic shocks to the ozone layer, possibly due to electrical storms and 
the presence of measurement errors, are captured by B(L)e,. The contemporaneous 
effect of fluorocarbons on the ozone layer is captured by the coefficient co and the 
lagged effects by the other transfer function weights (i.e., the values of the various 
ci). 

In contrast to the pure intervention model, there is no preintervention versus 
postintervention period, so that we cannot estimate a transfer function in the same 
fashion that we estimated an intervention model. However, the methods are very 
similar in that the goal is to estimate a parsimonious model. The procedure in- 
volved in fitting a transfer function model is easiest to explain by considering a 
simple case of (5.3). To begin, suppose {z,} is generated by a white-noise process 
that is uncorrelated with €, at all leads and lags. Also suppose that the realization of 
z, affects the {y,} sequence with a lag of unknown duration. Specifically, let 


bi? 


Yi FAY + Catia t Er (5.4) 


where {z,} and {e,} are white-noise processes such that Elz) = 0; a, and cy are 
unknown coefficients, and d is the “delay” or lag duration to be determined by the 
econometrician. 


Since {z,} and {e,} are assumed to be independent white-noise processes, it is 
possible to separately model the effects of each type shock. Since we can observe 
the various z, values, the first step is to calculate the cross-correlations between y, 
and the various z,_;. The cross-correlation between y, and z, is defined to be 


PyAi) = COVO Ze-i)/ O0; (5.5) 


where ©, and G, = the standard deviations of y, and z, respectively 


Notice that the standard deviation of each sequence is assumed to be time-indepen- 
dent. 

Plotting each value of 9,,(i) yields the cross-autocorrelation function (CACF) or 
cross-correlogram. In practice, we must use the cross-correlations calculated using 
sample data since we do not know the true covariances and standard deviations. 
The key point is that the sample cross-correlations provide the same type of infor- 
mation as the ACF in an ARMA model. To explain, solve (5.4) to obtain: 


vy Cala! Fe a,L) + e/( E a,L) 
Use the properties of lag operators to expand the expression ¢42,.¢/(1 — aL): 


3 
Ye = Chea + MZ d-t + BZp-a-2 + T2-a-3 + +) + € ML — aL) 


aaaeei ora ai an aa ia ta er a A aA aa oe aman 


Transfer Funetion Models 279 


Analogously to our derivation of the Yule-Walker equations, we can obtain the 
cross-covariances by the successive multiplication of y, by Z, z,,,... to form 


om a a 3 
Vike = CASS + AZid-1 + A id-2 + UZ R-g-3 +) + zE -a L) 
a S 2 3 
Yii = CAS red F AZ Zreday F Ali1ld-2 + A Zei Zra-3 +) + ZEA aL) 


Vilend = Cidia + A ,Zly-dli-d-1 + Trdind-2 + MH aernd-a H+) + Zial — a,L) 
Yi-d-1 = CLZi-d-1i-d + AZe-d-1i-d-1 + ptm angen t a E +) 
+ Z,4.;€/(1 -a L) 
Yid-2 = Ca pnd-2imd + A1Zi-d-2i-d-1 + Und 22 --a-2 + Cr ar ae tov) 
+ 2, 42€/(1 - a,L) 


Now take the expected value of each of the above equations. If we continue to 
assume that {z,} and {€,} are independent white-noise disturbances, it follows that 


Ey,z,=0 
Ey,2,-4 = 0 

* ot e.. rf 
EY,2,-4 P c40; 


= 2 
EY,2)-g-1 = C4410; 
Oe 8 
EY 2-4-2 = CaM 97 


so that in compact form, 


_» Eyz,.,=0 for alli<d bon 
=c 0} forizd (5.6) 


Dividing each value of Ey,z,_; = cov(y,, 2-;) by 6,6, yields the cross-correlogram. 
Note that the cross-correlogram consists of zeroes until lag d. The absolute value of 
height of the first nonzero cross-correlation is positively related to the magnitudes 
of cy and a,. Thereafter, the cross-correlations decay at the rate a,. The decay of the 
correlogram matches the autoregressive patterns of the {y,} sequence. 

The pattern exhibited by (5.6) is easily generalized. Suppose we allow both Z,a 
and z,_,_, to directly affect y,: 


Ye = Yr t+ Capa T Cada i2-g-1 + E 
Solving for y,, we obtain 
Yi = (Crd + Ca412-a-1 1 — aL) + €/C - aL) 


a 2 3 
=C Ara + Z-a-1 + UZ-a-2 + A Z-g-3 +") 
2 3 
+ Cat (Zant + 2-4-2 + M23 + Airaa +) + Ef — aL) 


280 Multiequation Time-Series Models 


so that 


Yi = Cara + (Cat + Casi )Zrnd-1 + AlCl + Cirina- #ai(cyd, + Cai )Zr-ad-3 ; 
ras es: teete/t—a,L) (5.7) 


Forming the standardized cross-covariances reveals the following pattern:° 


Cov), 2,0, = 0 : f fori<d 
=Cy - i fori=d 
= Cali + Chai fori=d+ 1 


=af (cga, + Cas) fori=d+j (j>0) 


The upper-left-hand graph (a) of Figure 5.3 shows the shape of the standardized 
cross-correlogram for d = 3, cg = 1, cq,, = 1.5, and a, = 0.8. Note that there are dis- 
tinct spikes at lags 3 and 4 corresponding to the nonzero values of c} and c4. 
Thereafter, the cross-correlations decay at the rate a,. The upper-right-hand graph 
(b) of the figure replaces c, with the value —1.5. Again, all cross-correlations are 
zero until lag 3; since c, = 1, the standardized value of p,.(3) = 1. To find the stan- 
dardized value of p,,(4) form: p,,(4) = 0.8 — 1.5 = —0.7. The subsequent values of 
P,-(i) decay at the rate 0.8. The pattern illustrated by these two examples general- 
izes to any intervention model of the form: 


Y, = lo + ay,_, + C(L)z, + BLE, (5.8) i 


The theoretical cross-correlogram has a shape with the following characteristics: 


1. All Py-(i) will be zero until the first nonzero element of the polynomial C(L). 


2. The form of B(L) is immaterial to the theoretical cross-correlogram. Since z, is 
uncorrelated with «e, at all leads and lags, the form of the polynomial B(L) will 
not affect any of the theoretical cross-correlations p,,(i). Obviously, the intercept 
term dy does not affect any of the cross-covariances or cross-correlations. 


3. A spike in the CACF indicates a nonzero element of C(L). Thus, a spike at lag d 
indicates that z,_, directly affects y,. 


4. All spikes decay at the rate a;; convergence implies that the absolute value of a, 
is less than unity. If 0 < a, < 1, decay in the cross-correlations will be direct, 
whereas if -1 <a, <0, the decay pattern will be oscillatory. 


Only the nature of the decay process changes if we generalize Equation (5.8) to 
include additional lags of y, ;. In the general case of (5.3), the decay pattern in the 
cross-correlations is determined by the characteristic roots of the polynomial A(L); 
the shape is precisely that suggested by the autocorrelations of a pure ARMA 
model. This should not come as a surprise; in the examples of (5.4) and (5.8), the 


Transfer Function Models 281 


Figure 5.3 Standardized cross-correlograms. 


ye = 0.8y -1 + 2-3 + 1.52, 4 + E yy = O.8y,4 + 2,3 — 1.82, 4 +E 


0 5 10 15 20 Tlo 5 10 15 20 
(a) (b) 


yy = 0.8y, 1 ~ 0.6y,_2 + 2,23 + E€ Yi = 0.8y,4 + 0.2y, -2 + z -3 + Er 


1.5 
1 
1 
0 
0.5 
-1 0 
0 5 10 15 20 0 5 10 15 20 


(c) (d) 


decay factor was simply the first-order autocorrelation coefficient a}. We know that 
there will be decay since all characteristic roots of 1 ~ A(L) must be outside of the 
unit circle for the process to be stationary. Convergence will be direct if the roots 
are positive and will tend to oscillate if a root is negative. Imaginary roots impart a 
sine-wave pattern to the decay process. 


The Cross-Covariances of a Second-Order Process 


To use another example, consider the transfer function: 


Yı = Yı + aY -2 + Calia + €, 


heli a i a A E a E SG N 


282 Multiequation Time-Series Models 


Using lag operators to solve for y, is inconvenient since we do not know the ñu- 


merical values of a, and a,. Instead, use the method of undetermined coefficients 
and form the challenge solution: 


x 
| 
Mi 
= 
+ 
M 
< 
mu 


i=0 i=0 


You should be able to verify that the values of W, are given by 


W=0 
W= Ca 
* Waar = Caa 


= 2 

Wao = Clay + aa) 

Was = A Ware + Wasi 
è Wara = a Wasa + aW iea 


Thus, for all i >d + 1, the successive coefficients satisfy the difference equation 
W, = a,W,_, + a,W,. At this stage, we are not interested in the values of the vari- 
ous V; so that it is sufficient to write the solution for y, as 


= 2 
Yi = Cima + CasZa-1 + CLAF + Ap)Z, 4-0 + CAGI + 2a, y)Z,4g-3 + + LVe,; 


Next, use this solution for y, to form all autocovariances using the Yule-Walker 
equations. Forming the expressions for Ey,z,_;, we get 


EY Zei = 0 

EY Zia = Cie 
EY 2p-d-1 = a,c 
EY 2-4-2 = CQ} + Az)07 


fori<d (since Ez,z,_,;=0 for i < d) 


Thus, there is an initial spike at lag d reflecting the nonzero value of c,. After 
one period, a, percent of the value c4 remains. After two periods—the number of 
autocorrelations in the transfer function—the decay pattern in the cross-covariances 
begins to satisfy the difference equation: 


Py: = Py CE = 1) + Arp, i ~ 2) 


The lower-left-hand graph (c) of Figure 5.3 shows the shape of the CACF for the 
case of d = 3, cy= 1, a, = 0.8, and a, = —0.6. The oscillatory pattern reflects the fact 
that the characteristic roots of the process are imaginary. For purposes of compari- 


l 
i 
| 
| 
i 
t 
i 
t 


Transfer Function Models 283 


son, the lower-right-hand graph (d) shows the standardized CACF of a unit root 
process. The fact that one of the characteristic roots is equal to unity means that a 
z3 shock has a permanent effect on the {y,} sequence. 

The econometrician will rarely be so fortunate to work with a {z,} series that is 
white-noise. We need to further generalize our discussion of transfer functions to 
consider the case in which the {z,} sequence is a stationary ARMA process. Let the 
model for the {z,} sequence be an ARMA process such that 


D(L)2, = EL) ex 


where D(L) and E(L), = polynomials in the lag operator L 
€,,= white-noise 


At this point, we can use the methodology developed in Chapter 2 to estimate the 
ARMA process generating the {z,} sequence. The residuals from such a model, de- 
noted by {€,,}, should be white-noise. The idea is to estimate the innovations in the 
{z,} sequence even though the sequence itself is not a white-noise process. At this 
point, it is tempting to think that we should form the cross-correlations between the 
{y,} sequence and {é,,_;}. However, this procedure would be inconsistent with the 
maintained hypothesis that the structure of the transfer function is given by (5.3). 
Reproducing (5.3) for your convenience, we get 


Y: = Aq + A(L)y,.1 + C(L)z, + B(L)e, 


Here, z,, Z1, 2-2, .. - (and not simply the innovations) directly affect the value of 
y, Cross-correlations between y, and the various €,,_; would not reveal the pattern 
of the coefficients in C(L). The appropriate methodology is to filter the {y,} se- 
quence by multiplying (5.3) by the previously estimated polynomial D(L)/E(L). As 
such, the filtered value of y, is D(L)y,/E(L) and denoted by yp. Consider 


D(L)yJE(L) = D(L) apf ECL) + D(IL)A(L)y,_ SEL) + CCL) D(L)2/E(L) 
+ BIL)D(L)e,/E(L) (5.9) 


Given that D(L)y/E(L) = yy, D(L)y,,/E(L) = yp. and D(L)z,/E(L) = €,,, (5.9) is 
equivalent to 


Yp = DLJA ECL) + AL) yp + CLE, + BIL) D(L)e, JEL) (5.10) 


Although you can construct the sequence D(L)y/E(L), most software packages 
can make the appropriate transformations automatically. Now compare (5.3) and 
(5.10). You can see that y, and C(L)z, will have the same correlogram as y, and 
C(L)e,,. Thus, when we form the cross-correlations between y; and e€,,_,, the cross- 
correlations will be the same as those from (5.3). As in the case in which {z,} was 


Since im i ite Ha tm a wet EE heen aa Tn aie We SORES. Se AE Re ee 5 


284 Multiequation Time-Series Models 


originally white-noise, we can inspect these cross-correlations for spikes and the 
decay pattern. In summary, the full procedure for fitting a transfer function entails: 


' STEP 1: Fit a ARMA model to the {z,} sequence. The technique used at this stage 
is precisely that for estimating any ARMA model. A properly estimated 
ARMA model should approximate the data-generating process for the {z,} 
sequence. The calculated residuals {ê} are called the filtered values of 
the {z,} series. These filtered values can be interpreted as the pure innova- 


tions in the {z,} sequence. Calculate and store the {ê} sequence. 


STEP 2: Obtain the filtered {y,} sequence by applying the filter D(L)/E(L) to each 
value of {y,}; that is, use the results of Step 1 to obtain D(L)V/E(L)y, = Yp- 
Form the cross-correlogram between y,, and €,,_;. Of course, these sample 
correlations will not precisely conform to their theoretical values. Under 
the null hypothesis that the cross-correlations are all zero, the sample 
variance of cross-correlation coefficient i asymptotically converges to 
(T — i)"', where T = number of usable observations. Let r,,(7) denote the 
sample cross-correlation coefficient between y, and z,_;. Under the null hy- 
pothesis that all the true values of p,,(i) are equal to zero, the variance of 
r, (1) converges to 


Var[r,{d)] = (T- D 


For example, with 100 usable observations, the standard deviation of 
the cross-correlation coefficient between y, and z,_, is the square root of 99 
(approximately equal to 0.10). If the calculated value of r,,(1) exceeds 0.2 
(or is less than —0.2), the null hypothesis can be rejected. Significant 
cross-correlations at lag i indicate that an innovation in z, affects the value 
of y,,; To test the significance of the first k cross-correlations, use the sta- 
tustic: 


k 
Q=TMT+2)) r OT-*) 


i=0 


Asymptotically, Q has a x? distribution with (k — pı — p2) degrees of 
freedom, where p, and p, denote the number of nonzero coefficients in 
A(L) and C(L), respectively. 


STEP 3: Examine the pattern of the cross-correlogram. Just as the ACF can be used 
as a guide in identifying an ARMA model, the CACF can help identify the 
form of A(L) and C(L). Spikes in the cross-correlogram indicate nonzero 
values of c; The decay pattern of the cross-correlations suggests plausible 


Tse ZOE d He EEE crac ED euv 


candidates for coefficients of A(L). This decay pattern is perfectly analo- 
gous to the ACF in a traditional ARMA model. In practice, examination of 
the cross-correlogram will suggest several plausible transfer functions. 
Estimate each of these plausible models and select the “best-fitting” 
model. At this point, you will have selected a model of the form: 


[1 -A(L)ly, = C(L)z, + e, 


where e, denotes the error term that is not necessarily white-noise. 


STEP 4: The {e,} sequence obtained in Step 3 is an approximation of B(L)e,. As 


such, the ACF of these residuals can suggest the appropriate form for the 
B(L) function. If the {e,} sequence appears to be white-noise, your task is 
complete. However, the correlogram of the {e,} sequence will usually sug- 
gest a number of plausible forms for B(L). Use the {e,} sequence to esti- 
mate the various forms of B(L) and select the “best” model for the B(L)e,. 


STEP 5: Combine the results of Steps 3 and 4 to estimate the full equation. At this 


stage, you will estimate A(L), B(L), and C(L) simultaneously. The proper- 
ties of a well-estimated model are such that the coefficients are of high 
quality, the model is parsimonious, the residuals conform to a white-noise 
process, and the forecast errors are small. You should compare your esti- 
mated model to the other plausible candidates from Steps 3 and 4. 


There is no doubt that estimating a transfer function involves judgment on the 


part of the researcher. Experienced econometricians would agree that the procedure 
is a blend of skill, art, and perseverance that is developed through practice. 
Nevertheless, there are some hints that can be quite helpful. 


1. 


After we estimate the full model in Step 5, any remaining autocorrelation in the 
residuals probably means that B(L) is misspecified. Return to Step 4 and refor- 
mulate the form of B(L) so as to capture the remaining explanatory power of the 
residuals. 


. After we estimate the full model in Step 5, if the residuals are correlated with 


{z,}, the C(L) function is probably misspecified. Return to Step 3 and reformu- 
late the specifications of A(L) and C(L). 


. Instead of estimating {¢,} as a pure autoregressive process, you can estimate 


B(L) as an ARMA process. Thus, e, = B(L)e, is allowed to have the form 
e, = G(L)e,/H(L). Here, G(L) and H(L) are low-order polynomials in the lag op- 
erator L. The benefit is that a high-order autoregressive process can often be ap- 
proximated by a low-order ARMA model. 


. The sample cross-correlations are not meaningful if {y,} and/or {z,} are not sta- 


tionary. You can test each for a unit root using the procedures discussed in 
Chapter 4. In the presence of unit roots, Box and Jenkins (1976) recommend dif- 


286 Multiequation Time-Series Models 


ferencing each variable until it is stationary. The next chapter considers unit roots 
in a multivariate context. For now, it is sufficient to note that this recommendation 
can lead to overdifferencing. 


The interpretation of the transfer function depends on the type of differencing 
performed. Consider the following three specifications and assume that la, | <1: 


Yi = aY + Coz, + €, . sth (5.11) 
Ay, =a AY; + Co2, +€, (5.12) 
Ay, =a AY t CAZ tE oi ey (5.13) 


In (5.11), a one unit shock in z, has the initial effect of increasing y, by cg units. 
This initial effect decays at the rate a,. In (5.12), a one-unit shock in z, has the ini- 
tial effect of increasing the change in y, by co units. The effect on the change de- 
cays at the rate a,, but the effect on the level of the {y,} sequence never decays. In 
(5.13), the change in z, affects the change in y, Here, a pulse in the {z,} sequence 
will have a temporary effect on the level of {y,}. Questions 1 and 2 at the end of 
this chapter are intended to help you gain familiarity with the different specifica- 
tions. 


3. ESTIMATING A TRANSFER FUNCTION 


High-profile terrorist events (e.g., the hijacking of TWA flight 847 on June 14, 
1985; the hijacking of the Achille Lauro cruise ship on October 7, 1985; and the 
Abu Nidal attacks on the Vienna and Rome airports on December 27, 1985) caused 
much speculation in the press about tourists changing their travel plans. Although 
opinion polls of prospective tourists suggest that terrorism affects tourism, the true 
impact, if any, can best be discovered through the application of statistical tech- 
niques. Polls conducted in the aftermath of significant incidents cannot indicate 
whether respondents rebooked trips. Moreover, polls cannot account for tourists not 
surveyed who may be induced by lower prices to take advantage of offers designed 
to entice tourists back to a troubled spot. 

To measure the impact of terrorism on tourism, in Enders, Sandler, and Parise 
(1992), we constructed the quarterly values of total receipts from tourism for 12 
countries.> The logarithmic share of each nation’s revenues was treated as the de- 
pendent variable (y,} and the number of transnational terrorist incidents occurring 
within each nation as the independent variable {z,}. The crucial assumption for the 
use of intervention analysis is that there be no feedback from tourism to terrorism. 
This assumption would be violated if changes in tourism induced terrorists to 
change their activities. 

Consider a transfer function in the form of (5.3): 


Yi = ao + A(L)y,-1 + C(L)z + B(L)e, 


Estimating a Transfer Function 287 


where y, = the logarithmic share of a nation’s tourism revenues in quarter t 


z, = is the number of transnational terrorist incidents within that country 
during quarter .° 


If we use the methodology developed in the previous section, the first step in fit- 
ting a transfer function is to fitan ARMA model to the {z,} sequence. For illustra- 
tive purposes, it is helpful to consider the Italian case since terrorism in Italy ap- 
peared to be white-noise (with a constant mean of 4.20 incidents per quarter). Let 
pÙ denote the autocorrelations between z, and z,_,. The correlogram for terrorist 
attacks in Italy is: 


Correlogram for Terrorist Attacks in Italy 


p00) P1) P2) p,G3) P4H P) PLO P7) p8) 


i 1 0.13 0.02 -0.06 -0.04 O11 -001 0.00 -0.13 


Each value of p,(i) is less than two standard deviations from unity. The 


Ljung—Box Q-statistics for the significance of the first 4, 8, 12, and 16 lags are 


Q(4) = 2.06, significance level = 0.725 
Q(8) = 4.52, significance level = 0.807 
Q(12) = 7.02, significance level = 0.855 
Q(16) = 8.06, significance level = 0.947 


Since terrorist incidents appear to be a white-noise process, we can skip Step 1; 
there is no need to fit an ARMA model to the series or filter the {y,} sequence for 


¿ Italy. At this point, we conclude that terrorists randomize their acts, so that the 


number of incidents in quarter f is uncorrelated with the number of incidents in pre- 


, vious periods. 


Step 2 calls for obtaining the cross-correlogram between tourism and terrorism. 
The cross-correlogram is 


Cross-Correlogram Between Terrorism and Tourism in Italy 
Pyl) Po) Pye(2)——Pye(3)—Pyel4) Pyel5) Pye) Pye7) Pye 8) 
-0.18 -0.23 -0.24  -0.05 0.04 0.13 0.04 0.00 0.10 


There are several interesting features of the cross-correlogram: 


1. With T observations and i lags, the theoretical value of the standard deviation of 
each value of p,,(i) is (T ~ )"'?. With 73 observations, T~"? is approximately 
equal to 0.117. At the 5% significance level (i.e., two standard deviations), the 
sample value of p,,(0) is not significantly different from zero and p,,(1) and 
P,(2) are just on the margin. However, the Q-statistic for p,,(0) = P,.(1) = Pyzl2) 
= Q is significant at the 0.01 level. Thus, there appears to be a strong negative re- 


ave BOR EEE AUIE Oe EE OER LUE HD 


lationship between terrorism and tourism beginning at lag 1 or 2. The key issue 
is to find the most appropriate model of the cross-correlations. 


2. It is good practice to examine the cross-correlations between y, and leading val- 
ues of z,,;. If the current value of y, tends to be correlated with future values of 
Zai it might be that the assumption of no feedback is violated. The presence of a 
significant cross-correlation between y, and leads of z, might be due to the effect 
of the current realization of y, on future values of the {z,} sequence. 


WwW 


. Since p,,(0) is not significantly different from zero at the 5% level, it is likely 
that the delay factor is one quarter; it takes at least one quarter for tourists to sig- 
nificantly revise their travel plans. However, there is no obvious pattern to the 
cross-correlation function. It is wise to entertain the possibility of several plausi- 
ble models at this point in the process. 


Step 3 entails examining the cross-correlogram and estimating each of the plausi- 
ble models. Based on the ambiguous evidence of the cross-correlogram, several dif- 
ferent models for the transfer function were estimated. We experimented using de- 
lay factors of zero, one, and two quarters. Since the decay pattern of the 
cross-correlogram is also ambiguous, we allowed A(L) to have the form: a,y,_, and 
ayy,-1 + 42y,-2. Some of our estimates are reported in Table 5.2. 

Model 1 has the form y, = ag + ayy,.1 + A2¥,-2 + C12.) + €, The problem with this 
specification is that the intercept term ap is not significantly different from zero. 
Eliminating this coefficient yields model 2. Notice that all coefficients in model 2 
are significant at conventional levels and that the magnitude of each is quite reason- 
able. The estimated value of c, is such that a terrorist incident reduces the logarith- 
mic share of Italy’s tourism by 0.003 in the following period. The point estimates 
of the autoregressive coefficients imply imaginary characteristic roots (the roots are 


Table 5.2 Terrorism and Tourism in Italy: Estimates from Step 2 


ao a a, Co ci c2 AIC/SBC 
Model 1 0.0249 0.795 -0.469 —0.0046 ~5.09/4.01 
(1.25) (2.74) = (= 1.63) (-2.34) 
Model 2 0.868  —0.696 ~0.0030 ~5.54/1.28 
(4.52) (-3.44) (-2.23) 
Model 3 1.09 -0.683  -0.0025 —4.94/1.89 
(4.51) (2.96) — (=2.10) 
Model 4 —0.0025  -0.0019 -4.84/3.27 
l (-1.15)  (-0.945) 
Model 5 0.217 0.0025 -0.0027 -2.93/3.89 
(-0.221) (-1.16)  (-0.080) 


Note: The numbers in parentheses are the f-statistics for the null hypothesis of a zero coefficient. 


captions anne ao seen tes a a ea NRA CTA LTTE CED 28 


Estimating a i ransjer funcuon bY 


0.434 + 0.691). Since these roots lie inside the unit circle, the effect of any incident 
decays in a sine-wave pattern. 

Model 3 changes the delay so as to allow z, to have a contemporaneous effect on 
y, The point estimates of the coefficients are reasonable and all are more than two 
standard deviations from zero. However, both the AIC and SBC select model 2 
over model 3. The appropriate delay seems to be one quarter. 

Since the cross-correlogram seems to have two spikes and exhibits little decay, 
we allowed both z, and z,., to directly affect y, You can see that models 4 and 5 
are inadequate in nearly all respects. Thus, we tentatively select mode! 2 as the 
“best” model. 

For Step 4, we obtained the {e,} sequence from the residuals of model 2. Hence, 


e 


t 


= y, — {-0,003z,_,/(1 — 0.868L + 0.676L’)] (5.14) 
Yı 


The correlogram of these residuals is: 


pO) pl) P pO) P4) PS) O P p8) 
E Be Ne a E a 
1.0 0621 0.554 0.431 0.419 0.150 0.066 0.021 -0.00 


ee en ee See 

The residuals were then estimated as an ARMA process using standard Box- 
Jenkins methods. Without going into details, we found that the best-fitting ARMA 
model of the residuals is 


e, = 0.485e,_, + 0.295e,. + (1 + 0.238L')e, (5.15) 


where the t-statistics for the coefficients = 4,08, 2.33, and 1.83 (significant at the 
0.000, 0.023, and 0.071 levels), respectively 


At this point, our tentative transfer function is 


y, = [-0.003z,_\/(1 — 0.868L + 0.6761) 
+ [C1 + 0.293L)e/(1 — 0.485L — 0.246L7)] (5.16) 


The problem with (5.16) is that the coefficients in the first expression were 
estimated separately from the coefficients in the second expression. In Step 5, we es- 
timated all coefficients simultaneously and obtained 


y, = [-0.0022z,,/(1 — 0.876L + 0.749L7)) 
+ [C1 + 0.293L)e/(1 — 0.504L - 0.245L°)] 517) 


Note that the coefficients of (5.17) are similar to those of (5.16). The ¢-statistics 
for the two numerator coefficients are —2.17 and 2.27, and the r-statistics for the 
four denominator coefficients are —7.78, 5.20, —4.31, and —1.94, respectively. The 
roots of the inverse characteristic equation for z,, are imaginary and outside the 


290 Multiequation Time-Series Models 


ae eii AE Rees roots are 0.585 + 0.996i, so that the character- 
l ] : + U.2461). As in model 2, the effects of a terrorist incident de- 

cay in a sine-wave pattern. The roots of the inverse characteristi iy eee 
-3.29 and 1.238, so that the characteristic roots are —0 303 and Pena 
note that you can obtain the origi Dei Ges ts 
ing (5.17) by the two E E SE 
The Ljung-Box Q-statistics indicate that the residuals o 
white-noise. For example, Q(8) = 6.54 and Q(16) = 
of 0.587 and 0.696, respectively. Additional diagnos 
ing the MA(4) term in the numerator (since the sig 
estimating other plausible forms of the transfer func 


si i 
ae a tee Meni the AIC and SBC and/or Q-statistics 
in the est i 
m (5.17) best captures the effects of oe pean a E 
pee ee aim was to use the estimated transfer function to simulate the ef- 
he . typical terrorist incident. Initializing the system such that all values of 
oa = 7 = 0 and setting all {€,} = 0, we let the value of z, = 1. Figure 5.4 
impulse response function for this one-unit change in the 
As you can see from the figure, after a one-period delay, t 
Sharply. After a sustained decline, tourism returns to its 
mately l year. The system has a memory 
lating decay pattern. 


Integrati 
grating over the actual values of the {z,} sequence allowed us to estimate 


Italy’s total tourism losses. The undiscounted losses exceeded 600 million SDR: 


with a 5% real interest rate, the total 
; value of the los illi 
SDR (equal to 6% of Italy’s annual revenues). cone E 


f (5.17) appear to be 
12.67 with significance levels 
tic checking included exclud- 
nificance level was 5.5%) and 
tions. All other models had in- 


z,} sequence. 
tounsm in Italy declines 


5 initial value in approxi- 
and tourism again falls; notice the oscil- 


Figure 5.4 _ Italy’s share of tourism (impulse response analysis) 
0.001 ‘hes 


-0.001 F 


= -0.002 


-0.003 
4 


Limits to Structural Multivariate Estimation 291 


4. LIMITS TO STRUCTURAL MULTIVARIATE ESTIMATION 


There are two important difficulties involved in fitting a multivariate equation such 
as a transfer function. The first concerns the goal of fitting a parsimonious model. 
Obviously, a parsimonious model is preferable to an overparameterized model. In 
the relatively small samples usually encountered in economic data, estimating an 
unrestricted model may so severely limit degrees of freedom as to render forecasts 
useless. Moreover, the possible inclusion of large but insignificant coefficients will 
add variability to the model's forecasts. However, in paring down the form of the 
model, two equally skilled researchers will likely arrive at two different transfer 
functions. Although one model may have a better “fit? (in terms of the AIC or 
SBC), the residuals of the other may have better diagnostic properties. There is sub- 
stantial truth to the consensus opinion that fitting a transfer function model has 
many characteristics of an “art form.” There is a potential cost to using a parsimo- 
nious model. Suppose you simply estimate the equation y, = A(L)y,_, + C(L)z, + 
B(L)e, using long lags for A(L), B(L) and C(L). As long as (z,} is exogenous, the es- 
timated coefficients and forecasts are unbiased even though the model is overpara- 
meterized. Such is not the case if the researcher improperly imposes zero restric- 
tions on any of the polynomials in the model. 

The second problem concerns the assumption of no feedback from the {y,} se- 
quence to the {z,} sequence. For the coefficients of C(L) to be unbiased estimates 
of the impact effects of {z,} on the {y,} sequence, z, must be uncorrelated with {e,} 
at all leads and lags. Although certain economic models may assert that policy vari- 
ables (such as the money supply or government spending) are exogenous, there 
may be feedback such that the policy variables are set with specific reference to the 
state of other variables in the system. To understand the problem of feedback, sup- 
pose that you were trying to keep a constant 70° temperature inside your apartment 
by turning up or down the thermostat. Of course, the “true” model is that turning up 
the heat (the intervention variable z,) warms up your apartment (the {y,} sequence). 
However, intervention analysis cannot adequately capture the true relationship in 
the presence of feedback. Clearly, if you perfectly controlled the inside tempera- 
ture, there would be no correlation between the constant value of the inside temper- 
ature and the movement of the thermostat. Alternatively, you might listen to the 
weather forecast and turn up the thermostat whenever you expected it to be cold. If 
you underreact by not turning up the heat high enough, the cross-correlogram be- 
tween the two variables would tend to show a negative spike reflecting the drop in 
room temperature with the upward movement in the thermostat setting. Instead, if 
you overreact by greatly increasing the thermostat setting, both the room tempera- 
ture and the thermostat setting will rise together. However, the movement in room 
temperature will not be as great as the movernent in the thermostat. Only if you 
moved the thermostat setting without reference to room temperature, would we ex- 
pect to uncover the actual model. 

The need to restrict the form of the transfer function and the problem of feedback 
or “reverse causality” led Sims (1980) to propose a nonstructural estimation strat- 


292 Multiequation Time-Series Models 


egy. To best understand the vector autoregression approach, it is useful to consider 
the state of macroeconometric modeling that led Sims to his then radical ideas. 


Multivariate Macroeconometric Models: 
Some Historical Background 


Traditionally, macroeconometric hypothesis tests and forecasts were conducted us- 
ing large-scale macroeconometric models. Usually, a complete set of structural 
equations was estimated one equation at a time. Then, all equations were aggre- 
gated in order to form overall macroeconomic forecasts. Consider two of the equa- 
tions from the Brookings quarterly econometric model of the United States as re- 
ported by Suits and Sparks (1965, p. 208): 


Cyr = 0.0656¥p ~ 10.93(P enf Po)-1 + 0.1889(N + Nurs) 
(0.0165) (2.49) (0.0522) 
Cyer = 4.2712 + 0.1691 Yp — 0.0743(ALQD yri P o) 
(0.0127) (0.0213) 


where Cpr = personal consumption expenditures on food 
Yp = disposable personal income 
Ponr = implicit price deflator for personal consumption expenditures 
k on food 
Pe «a = implicit price deflator for personal consumption expenditures 
N “ = civilian population 
Nut = military population including armed forces overseas 
Cner = personal consumption expenditures for nondurables other than 
food 


ALQDi, = end-of-quarter stock of liquid assets held by households 


and standard errors are in parentheses. 

The remaining portions of the model contain estimates for the other components 
of aggregate consumption, investment spending, government spending, exports, 
imports, for the financial sector, various price determination equations, etc. Note 
that food expenditures, but not expenditures on other nondurables, are assumed to 
depend on relative price and population. However, expenditures for other non- 
durables are assumed to depend on real liquid assets held by households in the pre- 
vious quarter. 

Are such ad hoc behavioral assumptions consistent with economic theory? Sims 
(1980, p. 3), considers such multiequation models and argues that 


... What “economic theory” tells us about them is mainly that any vari- 
able that appears on the right-hand side of one of these equations be- 
longs in principle on the right-hand side of all of them. To the extent 

,, that models end up with very different sets of variables on the right- 
hand side of these equations, they do so not by invoking economic the- 


Popegetvyio: 4 


dd PLLA CAF DALAL AAE AAA PECK EE REE beak Colbert ened AATA wre) 


ory, but (in the case of demand equations) by invoking an intuitive 
econometrician’s version of psychological and sociological theory, 
since constraining utility functions are what is involved here. 
Furthermore, unless these sets of equations are considered as a system 
in the process of specification, the behavioral implications of the re- 
strictions on all equations taken together may be less reasonable than 
the restrictions on any one equation taken by itself. 


On the other hand, many of the monetarists used reduced-form equations to as- 
certain the effects of government policy on the macroeconomy. As an example, 
consider the following form of the so-called “St. Louis model” estimated by 
Anderson and Jordan (1968). Using U.S. quarterly data from 1952 to 1968, they es- 
timated the following reduced-form GNP determination equation: 


AY, = 2.28 + 1.54AM, + 1.56AM,_, + 1.444M,, + 1.29AM,., 


+ 0.40AE, + 0.54AE,_, — 0.03A4£,_. - 0.74AE,_, (5.18) 
where AY = change in nominal GNP 
AM = change in the monetary base 
AE = change in “high employment” budget deficit 


In their analysis, Anderson and Jordan used base money and the high employ- 
ment budget deficit since these are the variables under the control of the monetary 
and fiscal authorities, respectively. The St. Louis model was an attempt to demon- 
strate the monetarist policy recommendations that changes in the money supply, 
but not changes in government spending or taxation, affected GNP. t-tests for the 
individual coefficients are misleading because of the substantial multicolinearity 
between each variable and its lags. However, testing whether the sum of the mone- 
tary base coefficients (i.e., 1.54 + 1.56 + 1.44 + 1.29 = 5.83) differs from zero 
yields a t-value of 7.25. Hence, Anderson and Jordan concluded that changes in the 
money base translate into changes in nominal GNP. Since all the coefficients are 
positive, the effects of monetary policy are cumulative. On the other hand, the test 
that the sum of the fiscal coefficients (0.40 + 0.54 — 0.03 ~ 0.74 = 0.17) equals zero 
yields a t-value of 0.54. According to Anderson and Jordan, the results support 
“lagged crowding out” in the sense that an increase in the budget deficit initially 
stimulates the economy. Over time, however, changes in interest rates and other 
macroeconomic variables lead to reductions in private sector expenditures. The cu- 
mulated effects of the fiscal stimulus are not statistically different from zero. 

Sims (1980) also points out several problems with this type of analysis. Sims’ 
criticisms are easily understood by recognizing that (5.18) is a transfer function 
with two independent variables {M,} and {£,} and no lags of the dependent vari- 
able. As with any type of transfer function analysis, we must be concerned with: 


1. Ensuring that lag lengths are appropriate. Serially correlated residuals in the 
presence of lagged dependent variables lead to biased coefficient estimates. 


aa ete ea etacie BEA orn 


294 Multiequation Time-Series Models 


2. Ensuring that there is no feedback between GNP and the money base or the bud- 

get deficit. However, the assumption of no feedback is unreasonable since if the 

monetary authorities (or the fiscal authorities) deliberately attempt to alter nomi- 

nal GNP, there is feedback. As in the thermostat example, if the monetary au- 

thority attempts to control the economy by changing the money base, we could 

not identify the “true” model. In the jargon of time-series econometrics, changes 

in GNP would “cause” changes in the money supply. One appropriate strategy 

would be to simultaneously estimate the GNP determination equation and 
money supply feedback rule. 


~ -Comparing the two types of models, Sims (1980, pp. 14-15) states: 


Because existing large models contain too many incredible restrictions, 
empirical research aimed at testing competing macroeconomic theories 
too often proceeds in a single- or few-equation framework. For this rea- 
son alone, it appears worthwhile to investigate the possibility of build- 
ing large models in a style which does not tend to accumulate restric- 


tions so haphazardly. ... It should be feasible to estimate large-scale 
macromodels as unrestricted reduced forms, treating all variables as en- 
dogenous. : 


5. INTRODUCTION TO VAR ANALYSIS 


When we are not confident that a variable is actually exogenous, a natural exten- 


sion of transfer function analysis is to treat each variable symmetrically. In the two-- 


variable case, we can let the time path of {y,} be affected by current and past real- 
izations of the {z,} sequence and let the time path of the {z,} sequence be affected 


by current and past realizations of the {y,} sequence. Consider, the simple bivariate 
system: 


Yi = bio Z bii + YY + Vii + Ey (5.19) 
2 = bao = baie + YY- + V2r-1 + Ezr (5.20) 


where it is assumed (1) that both y, and z, are stationary; (2) €,, and €,, are white- 
noise disturbances with standard deviations of ©, and ©, respectively; and (3) {¢,,} 
and {e,,} are uncorrelated white-noise disturbances. 

Equations (5.19) and (5.20) constitute a first-order vector autoregression (VAR) 
since the longest lag length is unity. This simple two-variable first-order VAR is 
useful for illustrating the multivariate higher-order systems that are introduced in 
Section 8. The structure of the system incorporates feedback since y, and z, are al- 
lowed to affect each other. For example, —b,, is the contemporaneous effect of a 
unit change of z, on y, and Yz; the effect of a unit change in y,_, on z,. Note that the 
terms €,, and €,, are pure innovations (or shocks) in y, and z,, respectively. Of 
course, if b2, is not equal to zero, €,, has an indirect contemporaneous effect on z, 


Introduction to VAR Analysis 295 


and if b, is not equal to zero, €,, has an indirect contemporaneous effect on y,. 
Such a system could be used to capture the feedback effects in our temperature- 
thermostat example. The first equation allows current and past values of the ther- 
mostat setting to affect the time path of the temperature; the second allows for feed- 
back between current and past values of the temperature and the thermostat 
setting.’ 

Equations (5.19) and (5.20) are not reduced-form equations since y, has a con- 
temporaneous effect on z, and z, has a contemporaneous effect on y,. Fortunately, it 
is possible to transform the system of equations into a more usable form. Using ma- 
trix algebra, we can write the system in the compact form: 


| 1 Ea a 
by, 1 Lz bo | (Ya. Yo JL@-1) Lee 
or 


Bx, a Do + Vix +E, 


aloe E, ; 
K ari rel es -| 4 
Ya Ya24 > Ez a 


Premultiplication by B~! allows us to obtain the vector autoregressive (VAR) 
model in standard form: 


x, = Ag t+ A,X) +e, (5.21) 
where A, = BT, 
A, = BCL, 
e = Ble, 


For notational purposes, we can define aj. as element i of the vector Apo, Ay as the 
element in row i and column j of the matrix A,, and e; as the element i of the vector 
e, Using this new notation, we can rewrite (5.21) in the equivalent form: 


Y= Aro + Ay Yiy + Ai te (5.22a) 
Zi = A20 + Az V1 + Qali + Ez (5.22b) 


To distinguish between the systems represented by (5.19) and (5.20) versus 
(5.22a) and (5.22b), the first is called a structural VAR or the primitive system and 
the second is called a VAR in standard form. It is important to note that the error 


296 Multiequation Time-Series Models 


terms (1.e., ¢,, and e,,) are composites of the two shocks € and €,,. Since e, = B'e 
we can compute e,, and e,, as 


E = (€p — by2€,, WO — By 2b2)) (5.23) 
en = (Ey - by €,/C1 ~ Bixby) (5.24) 
Since €, and €,, are white-noise processes, it follows that both €, and e, have 
zero means, constant variances, and are individually serially uncorrelated. To de- 
rive the properties of {e,,}, first take the expected value of (5.23): 
Ee, = E€,, = bi) — by2b>,) = 0. 


The variance of e,, is given by 


Eei, = El(e,, = by2€,, 1 — b,2b>,)1 
= (o? + bi207)/(1 = babai) 


Thus, the variance of e,, is time-independent. The autocovariances of ey and e; 
Fe Ta 


Eei = ERE — by 2€.)(6y4 — biz) I/C — bbz) = 0 fori #0 


Similarly, (5.24) can be used to demonstrate that e,, is a stationary process with a 
zero mean, constant variance, and having all autocovariances equal to zero. A criti- 
cal point to note is that e,, and e,, are correlated. The covariance of the two terms is 


Eeen = El(e,, p Di2r€ Eg — b21€y)]/(1 a biba)? 
= (b03 + b00 — bba) (5.25) 


In general, (5.25) will not be zero, so that the two shocks will be correlated. In 
the special case where b,, = b,, = 0 (i.e., if there are no contemporaneous effects of 
y, On z, and z, on y,), the shocks will be uncorrelated. It is useful to define the vari- 
ance/covariance matrix of the e,, and e, shocks as 


>| var(e;,)  COV(E,,, €>,) 
COV(Es,,€2,) varlen) 
Since all elements of E are time-independent, we can use the more compact form: 


2 
_ ior A dass 
2», b x a? (5:20) ž 


where var (e) = © 


Op = O21 = COV(E),, Ez) 


Stability and Stationarity 


In the first-order autoregressive model y, = ay + a,y,_, + €, the stability condition is 
that a, be less than unity in absolute value. There is a direct analogue between this 
stability condition and the matrix A, in the first-order VAR model of (5.21). Using 
the brute force method to solve the system, iterate (5.21) backward to obtain 


X, = Ag t A (Ag + Ax ten) + 
=(1+A,)Ag +A? X tA + 


where l=2 x2 identity matrix. 


After n iterations, 


n 
x, =(I+4Ą + = +A )A + Y Aiei FA Sei 

Š 7 i=0 
As we continue to iterate backward, it is clear that convergence requires the ex- 
pression A? vanish as n approaches infinity. As is shown below, stability requires 
that the roots of (1 — a,,L) (1 — aL) — (a;242,L”) lie outside the unit circle (the sta- 
bility condition for higher-order systems is derived in the appendix to the next 
chapter). For the time being, assume the stability condition is met, so that we can 

write the particular solution for x, as 


tł 


x, =u+ Aiei preis (5.27) 
i=0 


4 th, 


F= [aio(1 — a22) + Ay2Gr0 V/A, Z= [a(l ~ a1) + Gr @io/A 


A= (1 -a)l — ay) 8,242; 


If we take take the expected value of (5.27), the unconditional mean of x, is p; 
hence, the unconditional means of y, and z, are y and Z, respectively. The variances 
and covariances of y, and z, can be obtained as follows. First, form the variance/co- 
variance matrix as 


2 
E(x, HW)’ = E A 


298 Multiequation Time-Series Models 


Next, using (5.26), note that 


Since Egeju 0 for i + 0, it follows that 


E(x,- p)? = (I +A? + At + Ao +E. 
=(1-Aq)'E 


where it is assumed that the stability condition holds, so that Aj approaches zero 
as n approaches infinity. 


If we can abstract from an initial condition, the {y,} and {z,} sequences will be 
jointly covariance stationary if the stability condition holds. Each sequence has a fi- 
nite and time-invariant mean, and a finite and time-invariant variance. 


In order to get another perspective on the stability condition, use lag operators to 
rewrite the VAR model of (5.22a) and (5.22b) as 


Ye = Ayo + a, Ly, + QoL 2, + ei, 
Z = A20 + Az, Ly, + aL, + ex 


or 


(1 — a, ,L)y, = ayo + aial, + ey, 
(1 = azl), = a29 + Ap Ly, + ez 


If we use this last equation to solve for z, it follows that Lz, is 


Lz, = Llazo + az, LY, + e3) — aL) 


so that 


(1 ~ ay L)y, = ayo + Ay2L[(Aq9 + an, Ly, + e3) -an L) + e, 


Notice that we have transformed the first-order VAR in the {y,} and {z,} se- 


quences into a second-order stochastic difference equation in the {y,} sequence. 
Explicitly solving for y,, we get 


— 99 (1 ~ 299) + azaz + = ay Ley + a267; 
(1-a L) -an L)— azaz, l 


t 


(5.28) 


Introduction to VAR Analysis 299 
in the same fashion, you should be able to demonstrate that the solution for z, is 


, G29 = i) +a +l- ay Loa, tazer Bo 
l Oa, Daa, Daal l 


Both (5.28) and (5.29) have the same characteristic equation; convergence re- 
quires that the roots of the polynomial (1 — @,,L)(1 - aL) - aa, L° lie outside 
the unit circle. (If you have forgotten the stability conditions for second-order dif- 
ference equations, you might want to refresh your memory by reexamining Chapter 
1.) As in any second-order difference equation, the roots may be real or complex 
and convergent or divergent. Notice that both y, and z, have the same characteristic 
equation; as long as both a,, and a,, do not equal zero, the solutions for the two se- 
quences have the same characteristic roots. Hence, both will exhibit similar time 
paths. 


Dynamics of a VAR Model 


Figure 5.5 shows the time paths of four simple systems. For each system, 100 sets 
of normally distributed random numbers representing the {e,,} and (e,} sequences 
were drawn. The initial values of yọ and z) were set equal to zero, and the {y,} and 
{z,} sequences were constructed as in (5.22a) and (5.22b). The graph (a) uses the 
values: 


Qo = Ay, = 0,7 ay, = Ay = 0.7, and Qy2 = 4), = 0.2 


When we substitute these values into (5.27), it is clear that the mean of each se- 
ries is zero. From the quadratic formula, the two roots of the inverse characteristic 
equation (1 ~ 4,,L)(1 — @2L) - Q1249,L’ are 1.111 and 2.0. Since both are outside 
the unit circle, the system is stationary; the two characteristic roots of the solution 
for {y,} and {z,} are 0.9 and 0.5. Since these roots are positive, real, and less than 
unity, convergence will be direct. As you can see in the figure, there is a tendency 
for the sequences to move together. Since a, is positive, a large realization in y, in- 
duces a large realization of z,,,; since a, is positive, a large realization of z, induces 
a large realization of y,,,. The cross-correlations between the two series are posi- 
tive. 

The second graph (b) illustrates a stationary process with ajo = 429 = 0, G41 = az 
= 0.5, and a,, = a5, = —0.2. Again, the mean of each series is zero and the charac- 
teristic roots are 0.7 and 0.3. However, in contrast to the previous case, both a>, and 
a, are negative, so that positive realizations of y, can be associated with negative 
realizations of z,,, and vice versa. As can be seen from comparing the second 
graph, the two series appear to be negatively correlated. 

In contrast, graph (c) shows a process possessing a unit root, here, a), = a22 = 412 
= a, = 0.5. You should take a moment to find the characteristic roots. Undoubted- 
ly, there is little tendency for either of the series to revert to a constant long-run 
value. Here, the intercept terms a, and a, are equal to zero, so that graph (c) rep- 


300 Multiequation Time-Series Models 


Figure 5.5 Four VAR processes. 
Stationary process 1. 


2 


0 PE 50 100 . i 0 50 100 
(a) (b) 


Random walk process. Random walk plus drift. 


0 50 100 0 50 100 
(c) (d) 


resents a multivariate generalization of the random walk model. You can see how 
the series seem to meander together. In the fourth graph (d), the VAR process of 
graph (c) also contains a nonzero intercept term (a, 9 = 0.5 and a = 0) that takes 
the role of a “drift.” As you can see from graph (d), the two series appear to move 
closely together. The drift term adds a deterministic time trend to the nonstationary 
behavior of the two series. Combined with the unit characteristic root, the {y,} and 
{z,} sequences are joint random walk plus drift processes. Notice that the drift dom- 
inates the long-run behavior of the series. 


6. ESTIMATION AND IDENTIFICATION 


One explicit aim of the Box—Jenkins approach is to provide a methodology that 
leads to parsimonious models. The ultimate objective of making accurate short- 
term forecasts is best served by purging insignificant parameter estimates from the 


Estumation and idenufication Wi 


model. Sims’ (1980) criticisms of the “incredible identification restrictions” inher- 
ent in structural models argue for an alternative estimation strategy. Consider the 
following multivariate generalization of (5.21): 


X, = Ag AXi +Å + + Ap Xpt €, 8.30) 
where x, = an (n x 1) vector containing each of the n variables included in the 
VAR ee 
Ao = an (n x 1) vector of intercept terms 
A; = (nxn) matrices of coefficients 
and e, = an (n x l) vector of error terms 


Sims’ methodology entails little more than a determination of the appropriate 
variables to include in the VAR and a determination of the appropriate lag length. 
The variables to be included in the VAR are selected according to the relevant eco- 
nomic model. Lag-length tests (to be discussed below) select the appropriate lag 
length. Otherwise, no explicit attempt is made to “pare down” the number of para- 
meter estimates. The matrix Ap contains n intercept terms and each matrix A; con- 
tains n? coefficients; hence, n + pn’ terms need to be estimated. Unquestionably, a 
VAR will be overparameterized in that many of these coefficient estimates can be 
properly excluded from the model. However, the goal is to find the important inter- 
relationships among the variables and not make short-term forecasts. Improperly 
imposing zero restrictions may waste important information. Moreover, the regres- 
sors are likely to be highly colinear, so that the t-tests on individual coefficients 
may not be reliable guides for paring down the model. 

Note that the right-hand side of (5.30) contains only predetermined variables and 
the error terms are assumed to be serially uncorrelated with constant variance. 
Hence, each equation in the system can be estimated using OLS. Moreover, OLS 
estimates are consistent and asymptotically efficient. Even though the errors are 
correlated across equations, seemingly unrelated regressions (SUR) do not add to 
the efficiency of the estimation procedure since both regressions have identical 
right-hand-side variables. 

The issue of whether the variables in a VAR need to be stationary exists. Sims 
(1980) and others, such as Doan (1992), recommend against differencing even if the 
variables contain a unit root. They argue that the goal of VAR analysis is to deter- 
mine the interrelationships among the variables, not the parameter estimates. The 
main argument against differencing is that it “throws away” information concerning 
the comovements in the data (such as the possibility of cointegrating relationships). 
Similarly, it is argued that the data need not be detrended. In a VAR, a trending 
variable will be well approximated by a unit root plus drift. However, the majority 
view is that the form of the variables in the VAR should mimic the true data-gener- 
ating process. This is particularly true if the aim is to estimate a structural model. 
We return to these issues in the next chapter; for now, it is assumed that all vari- 
ables are stationary. Two sets of questions at ihe end of this chapter ask you to 
compare a VAR in levels to a VAR in first differences. 


302 Multiequation Time-Series Models 


Identification 


To illustrate the identification procedure, return to the structural two-variable/first- 
order VAR represented by (5.19) and (5.20). Due to the feedback inherent in the 
system, these equations cannot be estimated directly. The reason is that z, is corre- 
lated with the error term €,, and y, with the error term €,,. Standard estimation tech- 
niques require that the regressors be uncorrelated with the error term. Note there is 
no such problem in estimating the VAR system in standard form [i.e., in the form 
of (5.22a) and (5.22b)}]. OLS can provide estimates of the two elements of A, and 
four elements of A,. Moreover, by obtaining the residuals from the two regressions, 
it is possible to calculate estimates of the variance of ein» €2n and of the covariance 
between e,, and ep, The issue is whether it is possible to recover all the information 
present in the primitive system from the estimated system (5.19) and (5.20). In 
other words, is the primative form identifiable given the OLS estimates of the VAR 
model in the form of (5.22a) and (5.22b)? 

The answer to this question is “No, unless we are willing to appropriately restrict 
the primitive system.” The reason is clear if we compare the number of parameters 
in the structural VAR with the number of parameters recovered from the standard 
form VAR model. Estimating (5.22a) and (5.22b) yields six coefficient estimates 
(aio A2 411 412, A21, aNd az.) and the calculated values of var(e,,), var(e,), and 
cov(é;,, €z). However, the primitive system (5.19) and (5.20) contains 10 parame- 
ters. In addition to the two intercept coefficients 5,9 and bzo, the four autoregressive 
coefficients ¥,1, Yi25 Yo1, and Y22, and the two feedback coefficients b,, and b,,, there 
are the two standard deviations ©, and 6,. In all, the primitive system contains 10 
parameters, whereas the VAR estimation yields only nine parameters. Unless one is 
willing to restrict one of the parameters, it is not possible to identify the primitive 
system; Equations (5.19) and (5.20) are underidentified. If exactly one parameter of 
the primitive system is restricted, the system is exactly identified, and if more than 
one parameter is restricted, the system is overidentified. 

One way to identify the model is to use the type of recursive system proposed 
by Sims (1980). Suppose that you are willing to impose a restriction on the primi- 
tive system such that the coefficient b,, equals zero. Writing (5.19) and (5.20) with 
the constraint imposed yields 


Yi = bio ™ bize + Veen + Vien + Eyr (5.31) 
2 = bao + Yai Yin + Y222r-1 + Ez o (5.32) 


Given the restriction (which might be suggested by a particular economic 
model), it is clear that z, has a contemporaneous effect on y,, but y, affects the {z,} 
sequence with a one-period lag. Imposing the restriction b}, = 0 means that B~! is 
given by: 


Estimation and Identification 303 


Now, premultiplication of the primitive system by B~' yields: 


y] [1 z2] fho] f! A Yn a ss 
Zi O 1 lba] 10 E HYa Yoo Jet 0 4 Ez 
or 
Vey bio — b12b20 f Yi 7b E PER 
Zi by Ya Y22 Zi- Ezr 


Estimating the system using OLS yields the theoretical parameter estimates: 


Y, = Qjo + Gy Ye EAZ t Er 
Zi = A20 + AV H A22lr-1 + En 


where dj) =bio- bi yb 


ayi = Yi 7 bia me ee 

a2 = Jiz 7 Dy 2¥22 Eii ; 

az = b20 

42, = Yai 

an = Y22 

Since €,, = €, — 6;2€,, and ez; = E, we can calculate the parameters of the vari- 
ance/covariance matrix as n” gn. a 

Var(e,) = 0? + 1,07 (5.34a) 
Var(e,) = 0? (5.34b) 
Cov(e,, e3) = -b 20? (5.340) 


Thus, we have nine parameter estimates Ajo, Ain a12 420 Ari, A22 var(e,), 
var(e,), and cov(e,, €2) that can be substituted into the nine eauanpni above in or- 
der to simultaneously solve for bio, Dias Yin Yi2s b20 Yair V22 Oz» aNd 07 

Note also that the estimates of the {€„} and {e,,} sequences can be recovered. 
The residuals from the second equation (i.e., the {e@,} sequence) are estimates of 
the {e,,} sequence. Combining these estimates along with the solution for by. al- 
lows us to calculate the estimates of the {€,,} sequence using the relationship €, = 
En bir l 

` Take a minute to examine the restriction. In (5.32), the assumption 62, = 0 means 

that y, does not have a contemporaneous effect on z,. In (5.33), the restriction mani- 
fests itself such that both €,, and €,, shocks affect the contemporaneous value of y, 
but only e, shocks affect the contemporaneous value of z,. The observed values of 
e, are completely attributed to pure shocks to the {z,} sequence. Decomposing the 
residuals in this triangular fashion is called a Choleski decomposition. 


wee Baer reper 


Examples of Overidentified Systems 


The interesting feature of overidentifying restrictions is that they can be tested. 
Suppose you wanted to further restrict (5.33) such that Y», = 0. Such a restriction 
can have important economic implications; if b,, = 0 and y,, = 0, contemporaneous 
€y Shocks and lagged values of y,_, have no effect on z, Hence, the null hypothesis 
bz, = Y2 = 0 is equivalent to the hypothesis that {z,} is exogenous in that the {z,} 
sequence evolves independently of {y,}. Given the form of (5.33), the test that Y2, = 
O is the test that a,, in the VAR model is zero. To perform this test, simply estimate 
(5.33) and use a t-test to test whether a,, = 0. 

Not all testable restrictions are this straightforward. Consider another version of 
(5.19) and (5.20) such that y,. = Y2, = 0: 


Y= bio + Winn + bi + Ey: 
2, = bao + bay, + Y22r-1 + Ezr 


To write the system in standard VAR form, we can use direct substitution: 


Yi = bio + WY + bilbo + Days + Yo2%-1 + Eur) + Eye 
Zi = bao + Ba) (Byo + Yiyi + bizi + Eye) + Yr2% + Ex 


It follows that 


Yi = Qio + Ay Yi + aiei tey 
Zp = Q20 + Ag V1 + Ag2%-1 + Cr, 


where ayo = (bio + bi2b20)/(1 — 84 2b3)) 
ai, =Yi/C - bizb21) 
an = bi ~ By 2b21) 
Aq = (bao + b21b10)/(1 — bi2b21) 
az, = bafl — biab2:) 
a22 = Y22/(1 — bizb21) 


Since e, = (€, + b12€4)/(1 — biab21) and ey, = (b21€y, + €u)/(} — biab21), it follows 
that 


Var(e,) = (o; + DORIC — biba) 
Var(e2,) = (o? + b3,0,)/(1 ie biba)? 
© COV(ein €z) = (63105 + by,62)/(1 — biba)? 


OLS provides estimates of the six values of the a, and var(e,,), var(é2,), and 
COV(E},, €2,). These nine estimated values can be used with any eight of the nine 
equations above to solve for bio b20 biz bars Yie Y22 Gy, and G,. Since there is an 
_ extra equation, the system is overidentified. Unfortunately, the overidentifying re- 
striction here leads to nonlinear restrictions on the various a;. Nevertheless, many 


software packages can test such nonlinear restrictions using the methodology dis- 
cussed in Section 8. 


7. THE IMPULSE RESPONSE FUNCTION 


Just as an autoregression has a moving average representation, a vector autoregres- 
sion can be written as a vector moving average (VMA). In fact, Equation (5.27) is 
the VMA representation of (5.21) in that the variables (i.e., y, and z,) are expressed 
in terms of the current and past values of the two types of shocks (i.e., €, and ear). 
The VMA representation is an essential feature of Sims’ (1980) methodology in 
that it allows you to trace out the time path of the various shocks on the variables 
contained in the VAR system. For illustrative purposes, continue to use the two- 
variable/first-order model analyzed in the previous two sections. Writing (5.22a) 
and (5.22b) in matrix form, we get 


e 
Y: = 219 $ ayy ae | re 
Z a20 aa an J 2-1 ez 


~: or, using (5.27), we obtain 


2 z i 
mea 2 y +> ay palin (5.36) 
Zi z| lan Salven 


Equation (5.36) expresses y, and z, in terms of the {e,,} and {e2,} sequences. 


: However, it is insightful to rewrite (5.36) in terms of the {€,,} and (€,,} sequences. 


From (5.23) and (5.24), the vector of errors can be written as 


l -b € 
Eir 12 yt 4 
=|1/(1- bbz) il | (5.37) 
| [1/( 12921 We i ey 


so that (5.36) and (5.37) can be combined to form 


yı Y, Alar an | l Ea 
= -bb 
a EF [re n AÈ las -ba l Sen 


Since the notation is getting unwieldy, we can simplify by defining the 2 x 2 ma- 
trix >, with elements Ẹ;(i): 


1 —-b 
9; =a; [abated 4, | 


306 Multiequation Time-Series Models 


Hence, the moving average representation of (5.36) and (5.37) can be written in 
terms of the {e,,} and {e,,} sequences: i 


wy] LY, SnD PO ey; 
= + YE- 
k H n a 
i of more compactly, 
me Ht > hieni ee g 638) 
i=0 5 B 


The moving average representation is an especially useful tool to examine the in- 
teraction between the {y,} and {z,} sequences. The coefficients of 6, can be used to 
generate the effects of €,, and €,, shocks on the entire time paths of the {y,} and {z,} 


sequences. If you understand the notation, it should be clear that the four elements — 


9,,(0) are impact multipliers. For example, the coefficient ,,(0) is the instanta- 
neous impact of a one-unit change in €,, On y, In the same way, the elements (1) 
and $1201) are the one period responses of unit changes in E1 and E; ON Yp re- 
spectively. Updating by one period indicates that $ıı(1) and ġ,3(1) also represent 
the effects of unit changes in €,, and €,, ON Yui 


The accumulated effects of unit impulses in e€,, and/or e, can be obtained by the - 


appropriate summation of the coefficients of the impulse response functions. For 
example, note that after n periods, the effect of Ex on the value of y,,,, is (7). 
Thus, after n periods, the cumulated sum of the effects of e 


Yond 
i=0 


Letting n approach infinity yields the long-run multiplier. Since the {y,} and 
{z,} sequences are assumed to be stationary, it must be the case that for all jand k, 


z on the {y,} sequence is 


fie y J00 is finite. 


i=0 


The four sets of coefficients 1:4), 12i), 02,(i) and 65(2) are called the impulse 
response functions. Plotting the impulse response functions [i.e., plotting the coef- 
ficients of ,,(i) against i] is a practical way to visually represent the behavior of the 
{y,} and {z,} series in response to the various shocks. In principle, it might be pos- 
sible to know all the parameters of the primitive system (5.19) and (5.20). With 
such knowledge, it would be possible to trace out the time paths of the effects of 


ine Linpuise Response DUNAN Jue 


pure €, or €, shocks. However, this methodology is not available to the researcher 
since an estimated VAR is underidentified. As explained in the previous section, 


' knowledge of the various a; and variance/covariance matrix È is not sufficient to 


identify the primitive system. Hence, the econometrician must impose an additional 
restriction on the two-variable VAR system in order to identify the impulse re- 
sponses. 

One possible identification restriction is to use Choleski decomposition. For ex- 
ample, it is possible to constrain the system such that the contemporaneous value of 
y, does not have a contemporaneous effect on z,. Formally, this restriction is repre- 
sented by setting bz, = 0 in the primitive system. In terms of (5.37), the error terms 
can be decomposed as follows: 


Ei 5 ey = Oey : (5.39) 
€>,= €,, (5.40) 


Thus, if we use (5.40), all the observed errors from the {e3,} sequence are attrib- 
uted to €, shocks. Given the calculated {€,,} sequence, knowledge of the values of 
the {e,,} sequence and the correlation coefficient between e,, and ez, allows for the 
calculation of the {e,,} sequence using (5.39). Although this Choleski decomposi- 
tion constrains the system such that an €, shock has no direct effect z,, there is an 
indirect effect in that lagged values of y, affect the contemporaneous value of z,. 
The key point is that the decomposition forces a potentially important asymmetry 
on the system since an €, shock has contemporaneous effects on both y, and z,. For 
this reason (5.39) and (5.40) are said to imply an ordering of the variables. An E; 
shock directly affects e,, and e,, but on €,, shock does not affect e,,. Hence, z, is 
“prior” to y,. 

Suppose that estimates of equations (5.22a) and (5.22b) yield the values aig = ayo 
= 0, ay) = aņ = 0.7, and a, = a3, = 0.2. You will recall that this is precisely the 
model used in the simulation reported in graph (a) of Figure 5.5. Also suppose that 
the elements of the £ matrix are such that 6? = 63 and cov(e,,, é>,) is such that the 
correlation. coefficient between e,, and e, (denoted by p,,) is 0.8. Hence, the de- 


_ composed errors can be represented by? 


e = Eu + 0.8€,, (5.41) 
Ear = Ez (5.42) 


The top half of Figure 5.6, parts (a) and (b), traces out the effects of one-unit 
shocks to €, and €,, on the time paths of the {y,} and {z,} sequences. As shown in 


> the upper left-hand graph (a), a one unit-shock in e, causes z, to jump by one unit 
. and y, to jump by 0.8 units. [From (5.41), 80% of the e., shock has a contemporane- 
. ous effect on e,,.] In the next period, €,,,, returns to zero, but the autoregressive na- 
; ture of the system is such that y,,, and z,,, do not immediately return to their long- 
. run values. Since z,,, = 0.2y, + 0.7z, + €.,.), it follows that z,,, = 0.86 [0.2(0.8) + 


0.7(1) = 0.86]. Similarly, y,,,; = 0.7y, + 0.2z, = 0.76. As you can see from the figure, 


Figure 5.6 Two impulse reponse functions. 
Model 1: (> ape O71" pa 

t 
Response to £, shock. 


eir 
+ 
2 


Response to E, shock. 


“\0.2 07| Vem 


| 


0 ie 20 0 5 10 15 20 
ee Poth HE ee NE (b) 
Model 2: A -| pha: eth fete 
z= \-0.2 0.7] ztl en 


Respanse to £, shock. 1 Response to €, shock. 


0.25 


0 10 20 a, 10 


(c) (d) 20 | 


Solid line = {y,} sequence. Cross-hatch = {z,} sequence. 
note: In all cases e;, = 0.8v, + &, and ez = Ep 


the subsequent values of the { yı} and {z, 
els. This convergence is assured by the 
two characteristic roots are 0.5 and 0.9. 

The effects of a one-unit shock in €y are shown in the upper-right-hand graph (b) 
of the figure. The asymmetry of the decomposition is immediately seen by compar- 
ing the two upper graphs. A one-unit shock in €,, Causes the value of y, to increase 
by one unit; however, there is no contemporaneous effect on the value of z,, So that 
y, = 1 and z, = 0. In the subsequent period, ¢,,,, returns to zero. The aitor eressive 
nature of the system is such that y,,, = 0.7y, + 0.22, = 0.7 and z,,, = 0.2y, + 0.7z, = 
‘0.2. The remaining points in the figure are the impulse responses for periods a 


} sequences converge to their long-run lev- 
stability of the system; as found earlier, the 


ihe Impulse Response Funcuon 309 


through y,,.. Since the system is stationary, the impulse responses ultimately de- 


_ cay. 


Can you figure out the consequences of reversing the Choleski decomposition in 
such a way that b,,, rather than b,,, is constrained to equal zero? Since matrix A, is 
symmetrical (1.€., @;, = @). and @,, = a),), the impulse responses of an e, shock 


_ would be similar to those in graph (a) and the impulse responses of an €,, would be 


similar to those in graph (b). The only difference would be that the solid line repre- 
sents the time path of the {z,} sequence and the hatched line the time path of the 
{y,} sequence. 

As a practical matter, how does the researcher decide which of the alternative de- 
compositions is most appropriate? In some instances, there might be a theoretical 
reason to suppose that one variable has no contemporaneous effect on the other. In 
the terrorism/tourism example, knowledge that terrorist incidents affect tourism 
with a lag suggests that terrorism does not have a contemporaneous effect on 
tourism. Usually, there is no such a priori knowledge. Moreover, the very idea of 


_ imposing a structure on a VAR system seems contrary to the spirit of Sims’ argu- 


ment against “incredible identifying restrictions.” Unfortunately, there is no simple 
way to circumvent the problem; identification necessitates imposing some structure 
on the system. The Choleski decomposition provides a minimal set of assumptions 
that can be used to identify the primitive model.? 

It is crucial to note that the importance of the ordering depends on the magnitude 
of the correlation coefficient between e,, and e>, Let this correlation coefficient be 
denoted by p,z so that p,2 = 6,./0,02. Now suppose that the estimated model yields 
a value of È such that p; is found to be equal to zero. In this circumstance, the or- 
dering is immaterial. Formally, (5.41) and (5.42) become e}, = €,, and e,, = €,, when 
P\2 = 0. Thus, if there is no correlation across equations, the residuals from the y, 
and z, equations are necessarily equivalent to the €, and €, shocks, respectively. At 
the other extreme, if p,, is found to be unity, there is a single shock in the system 
that contemporarily affects both variables. Under the assumption b}, = 0, (5.41) and 
(5.42) become e, = €,, and e, = €,,; instead, under the assumption 
b2 = 0, (5.41) and (5.42) become e,, = €,, and ez, = €, Usually, the researcher will 
want to test the significance of p,.; as a rule of thumb, if | Piz l > 0.2, the correla- 


4 tion is deemed to be significant. If | P12 |> 0.2, the usual procedure is to obtain the 
“ impulse response function using a particular ordering. Compare the results to the 


impulse response function obtained by reversing the ordering. If the implications 
are quite different, additional investigation into the relationships between the vari- 
ables is necessary. 

The lower half of Figure 5.6, parts (c) and (d), presents the impulse response 
functions for a second model; the sole difference between models 1 and 2 is the 
change in the values of a,, and a}, to —0.2. Model 2 was used in the simulation re- 


© ported in graph (b) of Figure 5.5. The negative off-diagonal elements of A, weaken 


the tendency for the two series to move together. Using the impulse responses rep- 


x resented by (5.41) and (5.42) (d) shows that, y,,, = 0.7y, — 0.2z, = 0.7 and z,,, = 
_ —0.2y, + 0.7z,,, = 0.2. Tracing out the entire time path yields the lower-right-hand 


ae. ri : CRASS 


3 
4 


ie ace ai 


meisit sade Pie seas i atie hai S 


310 Multiequation Time-Series Models 


graph (d) of the figure. Since the system is stable, both sequences eventually con- 
verge to zero. 

The lower-left-hand graph (c) traces out the effect of a one-unit €,, Shock. In pe- 
riod f, z, rises by one unit and y, by 0.8 units. In period (t + 1), €,,,, returns to zero, 
but the value of y,,, is 0.7y, — 0.2z, = 0.36 and the value of Ze Is —0.2y, + 0.7z, = 
0.54. The points represented by t = 2 through 20 show that the impulse responses 


< converge to zero. 


_ Variance Decomposition 


Since unrestricted VARs are overparameterized, they are not particularly useful for 
short-term forecasts. However, understanding the properties of the forecast errors is 
exceedingly helpful in uncovering interrelationships among the variables in the sys- 
tem. Suppose that we knew the coefficients of Ao and A, and wanted to forecast the 
various values of x,,; conditional on the observed value of x, Updating (5.21) one 


period (i.e., x,,,; = Áo + A,x, + e,,,) and taking the conditional expectation of Xray, We 
obtain 


E Xni = Ao +A 1%, 


Note that the one-step ahead forecast error is Xt T EX1 = eni. Similarly, updat- 
ing two periods, we get 


X2 = Áo + Å Xi F E2 
=Áo +A (Ao +A x, +e) +e 


H2 
If we takë conditional expectations, the two-step ahead forecast of x,,, is 
EX2 = (Q + A,)Ao + Ay 


The two-step ahead forecast error (i.e., the difference between the realization of 


X42 and the forecast) is e,,. + Aje,,,. More generally, it is easily verified that the 
n-step ahead forecast is 


EXun= (+A, +A] + +t At Ay + Atx, 
and the associated forecast error is 
Enn Aani HA enna He AAT ena is (5.43) 
We can also consider these forecast errors in terms of (5.38) (i.e., the VMA form 


of the model). Of course, the VMA and VAR models contain exactly the same in- 
formation, but it is convenient (and a good exercise) to describe the properties of 


G 


The Impulse Response Function 311 


the forecast errors in terms of the {e€,} sequence. If we use (5.38) to conditionally 
forecast x,,,, the one-step ahead forecast error is Oo€,,,. In general, 


Xin = Ht Y bierni 


i=0 
so that the n-period forecast error xn — E,x,,, is 


n-i 


Xren — Ey Xan = X Qi €rsn-i 
i=0 x 


Focusing solely en the {y,} sequence, we see that the n-step ahead forecast error 
is oa 


Yran — EY an = OO) Gyn + O Eyan Fo On- Devas 
+ DAD Enon + Diall Ezran- noah Ò, (n z DEn 


i 2 
Denote the variance of the n-step ahead forecast error variance of y,,, aS ©,(n) 


2 = 67161 (0)? + 1? + = + Qai- D?) 
OTE EnO + O11 © E OHD) GaC? + = + Oa = 1 


Since all values of MON are necessarily nonnegative, the variance of the forecast 
error increases as the forecast horizon n increases. Note that it is possible to decom- 
pose the n-step ahead forecast error variance due to each one of the shocks. 
Respectively, the proportions of o (nY due to shocks in the {¢,,} and {e,,} se- 
quences are 


: 55[0,,(0)" +0,,(1)? + aoe +0,,(n-1)7] 
o, (n)? 
© 621020)? +0? + e +o] 


and ony 


The forecast error variance decomposition tells us the proportion of the move- 
ments in a sequence due to its “own” shocks versus shocks to the other variable. If 
€,, Shocks explain none of the forecast error variance of [{y,} at all forecast hori- 
zons, we can say that the {y,] sequence is exogenous. In such a circumstance, the 
{y,} sequence would evolve independently of the €,, shocks and (z) sequence. At 
the other extreme, €,, shocks could explain all the forecast error variance in the {y,} 
sequence at all forecast horizons, so that {y,} would be entirely endogenous. In ap- 
plied research, it is typical for a variable to explain almost all its forecast error vari- 


E ES I EEEE ia aa ed ha a fee O ERPI POLEO a Las a E a al a Mite ei 


312 Multiequation Time-Series Models 


ance at short horizons and smaller proportions at longer horizons. We would expect 
this pattern if €, shocks had little contemporaneous effect on y, but acted to affect 
the {y,} sequence with a lag. 

Note that the variance decomposition contains the same problem inherent in im- 
pulse response function analysis. In order to identify the {e,,} and {€,,} sequences 
it is necessary to restrict the B matrix. The Choleski decompostion used in (5 39) 
and (5.40) necessitates that all the one-period forecast error variance of z, is due to 
€ If we use the alternative ordering, all the one-period forecast error varanes of y 
would be due to ¢,,. The dramatic effects of these alternative assumptions are at 
duced at longer forecasting horizons. In practice, it is useful to examine the vari- 
ance decomposition at various forecast horizons. As n increases, the variance de- 
compositions should converge. Moreover, if the correlation coefficient p}, is 
significantly different from zero, it is customary to obtain the variance decom 
tions under various orderings. 

Nevertheless, impulse response analysis and variance decompositions (together 
called innovation accounting) can be useful tools to examine the relationships 
among economic variables. If the correlations among the various innovations are 
small, the identification problem is not likely to be especially important. The alter- 
native orderings should yield similar impulse responses and variance decomposi- 
tions. Of course, the contemporaneous movements of many economic variables are 
highly correlated. Sections 10 through 13 consider two attractive methods that can 
be used to identify the structural innovations. Before examining these techniques 
we consider hypothesis testing in a VAR framework and reexamine the interrela- 
tionships between terrorism and tourism. 


8. HYPOTHESIS TESTING 


In principle, there is nothing to prevent you from incorporating a large number of 
variables in the VAR. It is possible to construct an n-equation VAR with each 
equation containing p lags of all n variables in the system. You will want to include 
those variables that have important economic effects on each other. As a practical 
matter, degrees of freedom are quickly eroded as more variables are included. For 
example, with monthly data with 12 lags, the inclusion of one additional variable 
uses an additional 12 degrees of freedom. A careful examination of the relevant 
oe model will help you to select the set of variables to include in your VAR 
model. 
An n-equation VAR can be represented by 


Xir Aio AlL) Anall) ~ Ain (LZ) || Xiri ey 
Xar |_| Ago x Ay (EL) An(L) | ACL) || Xz- | | €z 
+ 


ba 


mt | (Ano | LAnCL) Ana (L) - Ann (L)| Ex Cnt (5.44) 


ni-l 


where Aj = the parameters representing intercept terms 
A,{L) = the polynomials in the lag operator L. 
The individual coefficients of A,(L) are denoted by a,(1), a(2),.... Since all 


equations have the same lag length, all the polynomials A,(L) are of the same de- 
gree. The terms e; are white-noise disturbances that may be correlated. Again, des- 
ignate the variance/covariance matrix by E, where the dimension of Lis (nxn). 

In addition to the determination of the set of variables to include in the VAR, it 
is important to determine the appropriate lag length. One possible procedure is to 
allow for different lag lengths for each variable in each equation. However, in or- 
der to preserve the symmetry of the system (and to be able to use OLS efficiently), 
it is common to use the same lag length for all equations. As indicated in Section 
6, as long as there are identical regressors in each equation, OLS estimates are 
consistent and asymptotically efficient. If some of the VAR equations have regres- 
sors not included in the others, seemingly unrelated regressions (SUR) provide ef- 
ficient estimates of the VAR coefficients. Hence, when there is a good reason to 
let lag lengths differ across equations, estimate the so-called near VAR using 
SUR. 

In a VAR, long lag lengths quickly consume degrees of freedom. If lag length is 
p, each of the n equations contains np coefficients plus the intercept term. 
Appropriate lag-length selection can be critical. If p is too small, the model is mis- 
specified; if p is too large, degrees of freedom are wasted. To check lag length, be- 
gin with the longest plausible length or longest feasible length given degrees-of- 
freedom considerations. Estimate the VAR and form the variance/covariance 
matrix of the residuals. Using quarterly data, you might start with a lag length of 12 
quarters based on the a priori notion that 3 years is sufficiently long to capture the 
system’s dynamics. Call the variance/covariance matrix of the residuals from the 
12-lag model Z,,. Now suppose you want to determine whether eight lags are ap- 
propriate. After all, restricting the model from 12 to eight lags would reduce the 
number of estimated parameters by 4n in each equation. 

Since the goal is to determine whether lag 8 is appropriate for all equations, an 
equation by equation F-test on lags 9 through 12 is not appropriate. Instead, the 
proper test for this cross-equation restriction is a likelihood ratio test. Reestimate 
the VAR over the same sample period using eight lags and obtain the variance/co- 
variance matrix of the residuals £g. Note that Ls pertains to a system of n equations 
with 4n restrictions in each equation for a total of 4n? restrictions. The likelihood 
ratio statistic is 


(T\log| Z| - log | E121) 


However, given the sample sizes usually found in economic analysis, Sims 
(1980) recommends using 


(T — c)(log | Zel - log] £21) 


314 Muttiequation Time-Series Models 


where T number of usable observations 
c = number of parameters estimated in each equation of the unre- 
stricted system 
log | x, | = is the natural logarithm of the determinant of £,,. 


In the example at hand, c = 12n + 1 since each equation of the unrestricted model 
has 12 lags for each variable term plus an intercept. 

This statistic has the asymptotic x’ distribution with degrees of freedom equal to 
the number of restrictions in the system. In the example under consideration, there 
are 4n restrictions in each equation, for a total of 4n? restrictions in the system. 
Clearly, if the restriction of a reduced number of lags is not binding, we would ex- 
pect log | Xe | to be equal to log | X42 H Large values of this sample statistic indicate 
that only eight lags is a binding restriction; hence, a rejection of the null hypothesis 
that lag length = 8. If the calculated value of the statistic is less than x? at a prespec- 
ified significance level, we would not be able to reject the null of only eight lags. 
At that point, we could seek to determine whether four lags were appropriate by 
constructing 


(T- c)(log |Z, | — log |Z, |) 


Considerable care should be taken in paring down lag length in this fashion. 
Often, this procedure will not reject the null hypotheses of eight versus 12 lags and 
four versus 8 lags, although it will reject a null of four versus 12 lags. The problem 
with paring down the model is that you may lose a small amount of explanatory 
power at each stage. Overall, the total loss in explanatory power can be significant. 
In such circumstances, it is best to use the longer lag lengths. 

This type of likelihood ratio test is applicable to any type of cross-equation re- 
striction. Let £, and X, be the variance/covariance matrices of the unrestriced and 
restricted systems, respectively. If the equations of the unrestricted model contain 
different regressors, let c denote the maximum number of regressors contained in 
the longest equation. Asymptotically, the test statistic: 


(T-cXlog|£,| ~tog|£,1) (5.45) 
has a x? distribution with degrees of freedom equal to the number of restrictions in 
the system. 

To take another example, suppose you wanted to capture seasonal effects by in- 
cluding three seasonal dummies in each of the n equations of a VAR. Estimate the 
unrestricted model by including the dummy variables and estimate the restricted 
model by excluding the dummies. The total number of restrictions in the system is 
3n. If lag length is p, the equations of the unrestricted model have np + 4 parame- 
ters (np lagged variables, the intercept, and the three seasonals). For T usable obser- 
vations, set c = np + 4 and calculate the value of (5.45). If for some prespecified 
significance level, this calculated value x? (with 3n degrees of freedom) exceeds 


Hypothesis Testing 315 


the critical value, the restriction of no seasonal effects can be rejected. Equation 
(5.45) can also be used to test the type of nonlinear restriction mentioned in Section 
6. Estimate the restricted and unrestricted systems. Then compare the calculated 
value of (5.45) to the critical value found in a x° table. 

The likelihood ratio test is based on asymptotic theory that may not be very use- 
ful in the small samples available to time-series econometricians. Moreover, the 
likelihood ratio test is only applicable when one model is a restricted version of the 
other. Alternative test criteria to determine appropriate lag lengths and/or seasonal- 
ity are the multivariate generalizations of the AIC and SBC: 


AIC =Tlog|=| +2N 
SBC = Tlog| Z| +N log(7) 


where ix] = determinant of the variance/covariance matrix of the residuals 
N total number of parameters estimated in all equations. 


H 


Thus, if each equation in an n-variable VAR has p lags and an intercept, N = n?p + 
n; each of the n equations has np lagged regressors and an intercept. 

Adding additional regressors will reduce log I| at the expense of increasing N. 
As in the univariate case, select the model having the lowest AIC or SBC value. 
Make sure that you adequately compare the models by using the same sample pe- 
riod. Note that these statistics are not based on any distributional theory; as such 
they are not used in testing the type of cross—equation restrictions discussed in 
Section 6. 


Granger Causality 


A test of causality is whether the lags of one variable enter into the equation for 
another variable. Recall that in (5.33), it was possible to test the hypotheses that 
dy, = 0 using a t-test. In a two-equation model with p lags, {y,} does not Granger 
cause {z,} if and only if all the coefficients of A,,(L) are equal to zero. Thus, if {y,} 
does not improve the forecasting performance of {z,}, then {y,} does not Granger 
cause {z,}. The direct way to determine Granger causality is to use a standard F-test 
to test the restriction: 


az,(1) = a,(2) =a, (3) =. =O 


In the n variable case in which A,{L) represents the coefficients of lagged values 
of variable j on variable i, variable j does not Granger cause variable i if all coeffi- 
cients of the polynomial A,(L) can be set equal to zero. 

Note that Granger causality is a weaker condition than the condition for exogene- 
ity. A necessary condition for the exogeneity of {z,} is for current and past values 

“of {y,} to not affect {z,}. To explain, reconsider the VMA model. In our previous 
example of the two-variable VMA model, (y,} does not Granger cause {z,) if and 


ete CE RTT ss mnan ina 


ic 


316 Multiequation Time-Series Models 


only if all coefficients of (i) = 0 for i > 0. To sketch the proof, suppose that all 
coefficients of 6,,(é) are zero for i > 0. Hence, z,,, is given by 


Zrel =Z + O21 (OE + ¥ 922 (DEui 


i=0 


If we forecast z,,, conditional on the value of Z» we obtain the forecast error 
$21(0)€y41 + 22(0)e,,,;. Given the past value of z,, information concerning past val- 
ues of y, does not aid in forecasting z,. In other words, for the VAR(1) model under 
consideration, EZ: | z) = EK | Zo Y). 

The only additional information contained in y, are the past values of {e,,}. 

However, such values do not affect z, and so cannot improve on the forecasting pee 
formance of the z, sequence. Thus, {y,} does not Granger cause {z}. However, if 
2,(0) is not equal to zero, {z,} is not exogenous to {y,}. If @,,(0) is not zero, pute 
shocks to y,,, Gie., € n1) affect the value of z,,, even though the {y,} sequence does 
not Granger cause the {z,} sequence. 
l A block exogeneity test is useful for detecting whether to incorporate a variable 
into a VAR. Given the aforementioned distinction between causality and exogene- 
ity, this multivariate generalization of the Granger causality test should actually be 
called a “block causality” test. In any event, the issue is to determine whether lags 
of one variable—say, w;—Granger cause any other of the variables in the system. 
In the three-variable case with Wp Yn and z, the test is whether lags of w, Granger 
cause either y, or z,. In essence, the block exogeneity restricts all lags of w, in the y, 
and z, equations to be equal to zero. This cross-equation restriction is properly 
tested using the likelihood ratio test given by (5.45). Estimate the y, and z, equa- 
tions. using p lagged values of {y,}, {z,}, and {w,} and calculate Z, Reestimate the 
two equations excluding the lagged values of {w,} and calculate È, Next, find the 
likelihood ratio statistic: 


(T-c)(log| 5, - log |Z, l) 


As in (5.45), this statistic has a x? distribution with degrees of freedom equal to 
2p (since p lagged values of {w,} are excluded from each equation). Here, c = 3p 
+ 1 since the two unrestricted y, and Z, equations contain p lags of {y,}, { Z}, and 
{w,) plus a constant. 


9. EXAMPLE OF A SIMPLE VAR: TERRORISM AND 
TOURISM IN SPAIN 


In Enders and Sandler (1991), we used the VAR methodology to estimate the im- 
pact of terrorism on tourism in Spain during the period from 1970 to 1988. Most 
transnational terrorist incidents in Spain during this nerind were nernetrated hu laft- 


Example of a Simple VAR: Terrorism and Tourism in Spain 317 


wing groups, which included the Anti-Fascist Resistance Group of October 1 
(GRAPO), the ETA, the now defunct International Revolutionary Armed Front 
(FRAP), and Iraultza. Most incidents are attributed to the ETA (Basque Fatherland 
and Liberty) and its splinter groups, such as the Autonomous Anti-Capitalist 
Commandos (CAA). Right-wing terrorist groups included the Anti-Terrorist Liber- 
ation Group (GAL), Anti-Terrorism ETA, and Warriors of Christ the King. Catalan 
independence groups, such as Free Land (Terra Lliure) and Catalan Socialist Party 
for National Liberation, have been active in the late 1980s and often target U.S. 
businesses. 

The transfer function model of Section 3 may not be appropriate because of feed- 
back between terrorism and tourism. If high levels of tourism induce terrorist activ- 
ities, the basic assumption of the transfer function methodology is violated. In fact, 
there is some evidence that the terrorist organizations in Spain target tourist hotels 
in the summer season. Since increases in tourism may genérate terrorist acts, the 
VAR methodology allows us to examine the reactions of tourists to terrorism and 
those of terrorists to tourism. We can gain some additional insights into the interre- 
lation between the two series by performing causality tests of terrorism on tourism 
and of tourism on terrorism. Impulse response analysis can quantify and graphically 
depict the time path of the effects of a typical terrorist incident on tourism. 

We assembled a time series of all publicly available transnational terrorists inci- 
dents that took place in Spain from 1970 through 1988. In total, there are 228 
months of observation in the time series; each observation is the number of terrorist 
incidents occurring that month. The tourism data are taken from various issues of 
the National Statistics Institute’s (Estadistic Institute Nacional) quarterly reports. In 
particular, we assembled a time series of the number of foreign tourists per month 
in Spain for the 1970 to 1988 period. 


Empirical Methodology 


Our basic methodology involves estimating tourism and terrorism in a vector au- 
toregression (VAR) framework. Consider the following system of equations: 


n, = Aig + Ay (L)n,_, + ADi + 1, (5.46) 
i, = Q20 + Ap (L)n,_, + A2(L)i 1 + ez (5.47) 


where n, = the number of tourists visiting Spain during time period t 

the number of transnational terrorist incidents in Spain during í 

are the | x 13 vectors containing a constant, 1] seasonal (monthly) 

dummy variables, and a time trend 

A; = the polynomials in the lag operator L 

e, = independent and identically distributed disturbance terms such that 
E(e,,€2,) is not necessarily zero 


~. 
S 


R 
il 


Although Sims (1980) and Doan (1992) recommend against the use of a deter- 


mMinictic tima trend wa derided nat ta heed their advice We experimented with 


ere 


ao oa ee A 7 

: i $ . 
T Hed see 
ae 


Biya tela ue cht 


os os. 


wits wnt 


pena 


eR eyin A n 


several alternative ways to model the series; the model including the time trend had 
yielded the best diagnostic statistics. Other variants included differencing (5.46) 
and (5.47) and simply eliminating the trend and letting the random walk plus drift 
terms capture any nonstationary behavior. Questions 5 and 6 at the end of this 
chapter ask you to compare these alternative ways of estimating a VAR. 

The polynomials A,.(L) and A,,(L) in (5.46) and (5.47) are of particular interest. 
If all the coefficients of A2, are zero, then knowledge of the tourism series does not 


reduce the forecast error variance of terrorist incidents. Formally, tourism would > 


not Granger cause terrorism. Unless there is a contemporaneous response of terror- 
ism to tourism, the terrorism series evolves independently of tourism. In the same 
way, if all the coefficients of A,,(L) are zero, then terrorism does not Granger cause 
tourism. The absence of a statistically significant contemporaneous correlation of 


the error terms would then imply that terrorism cannot affect tourism. If, instead, l 


any of the coefficients in these polynomials differ from zero, there are interactions 
between the two series. In case of negative coefficients of A (L), terrorism would 
have a negative effect on the number of foreign tourist visits to Spain. 

Each equation was estimated using lag lengths of 24, 12, 6, and 3 months (i.e., 
for four estimations, we set L = 24, 12, 6, and 3). Because each equation has identi- 
cal right-hand-side variables, ordinary least squares (OLS) is an efficient estimation 
technique. Using x” tests, we determined that a lag length of 12 months was most 
appropriate (reducing the length from 24 to 12 months had a x? value that was sig- 
nificant at the 0.56 level, whereas reducing the lag length to 6 months had a x? 
value that was significant at the 0.049 level). The AIC indicated that 12 lags were 
appropriate, whereas the SBC suggested we could use only six lags. Since we were 
using monthly data, we decided to use the 12 lags. 

To ascertain the importance of the interactions between the two series, we ob- 
tained the variance decompositions. The moving average representations of Equa- 
tions (5.46) and (5.47) express n, and i, as dependent on the current and past values 
of both {e,,} and {e,,} sequences: 


n, = co + D(C je uj HCrjenj)+ en (5.48) 
j=l 


i, = do +Ý (djen; + dy jy) ) ten (5.49) 
j=! 


where co and dy are vectors containing constants, the 11 seasonal dummies, and a 
trend; and cj, C2, dj, and d}; are parameters. 

Because we cannot estimate (5.48) and (5.49) directly, we used the residuals of 
(5.46) and (5.47) and then decomposed the variances of n, and i, into the percent- 
ages attributable to each type of innovation. We used the orthogonalized innova- 
tions obtained from a Choleski decomposition; the order of the variables in the fac- 


PO OEE ee vy RIED RSS COSY OCR ESLI MAIA SS EOTICM eit DjA Yaz 


torization had no qualitative effects on our results (the contemporaneous correlation 
between e,, and e,, was -0.0176). 


Empirical Results 


With a 24-month forecasting horizon used, the variance decompositions are re- 
ported in Table 5.3, in which the significance levels are in parentheses. As ex- 
pected, each time series explains the preponderance of its own past values; n, ex- 
plains over 91% of its forecast error variance, whereas i, explains nearly 98% of its 
forecast error variance. It is interesting that terrorist incidents explain 8.7% of the 
forecast error variance of Spain’s tourism, whereas tourism explains only 2.2% of 
the forecast error variance of terrorist incidents. More important, Granger causality 
tests indicate that the effects of terrorism on tourism are significant at the 0.006 
level, whereas the effects of tourism on terrorism are not significant at conventional 
levels. Thus, causality is unidirectional: Terrorism affects tourism but not the re- 
verse. We also note that the terrorism series appears to be autonomous in the sense 
that neither series Granger causes i, at conventional levels. This result is consistent 
with the notion that terrorists randomize their incidents, so that any one incident is 
not predictable on a month-to-month basis. 

Forecasts from an unrestricted VAR are known to suffer from overparameteriza- 
tion. Given the results of the variance decompositions and Granger causality tests, 
we reestimated (5.46) and (5.47) restricting all the coefficients of A2,(L) to zero. 
Because the right-hand variables were no longer identical, we reestimated the equa- 
tions with seemingly unrelated regressions (SUR). With the resulting coefficients 
from the SUR estimates, the effects of a typical terrorist incident on Spain’s 
tourism can be depicted. In terms of the restricted version of (5.49), we set all e,,_; 
and e»; equal to zero for j > 0. We then simulated the time paths resulting from 
the effects of a one-unit shock to e. The time path is shown in Figure 5.7, where 
the vertical axis measures the monthly impact on the number of foreign tourists 
and the horizontal axis the months following the shock. To smooth out the series, 


Table 5.3 Variance Decomposition Percentage of 24- 


Month Error Variance it 
Percent of forecast Typical shock in 
error variance in n, i, 
oe n, 91.3 8.7 
= : (3 x E-15) (0.006) 
: i, 2.2 97.8 
(17.2) (93.9) 


TT TC CS a a a ID, 

Note: The numbers in parentheses indicate the significance level for the 
joint hypothesis that all lagged coefficients of the variable in ques- 
tion can be set equal to zero. 


320 Multiequation Time-Series Models 


Figure 5.7 Tourism response to a terrorist incident. 
(3 month moving average) 


10,000 


5000 


Tourists 


1 3 5 7 9111315 17 1921.23 25 27 29 31 33 35 
Months °°: 


BER EY 


we present the time path of a 3-month moving average of the simulated tourism re- 
sponse function. l i 

After a “typical” terrorist incident, tourism to Spain begins to decline in the third 
month. After the sixth month, tourism begins to revert to its original level, There 
does appear to be a rebound in months 8 and 9. There follows another drop in 
tourism in month 9, reaching the maximum decline about 1 year after the original 
incident. Obviously, some of this pattern is due to the seasonality in the series. 
However, tourism slowly recovers and generally remains below its preincident 
level for a substantial period of time. Aggregating all 36 monthly impacts, we esti- 
mate that the combined effects of a typical transnational terrorist incident in Spain 
is to decrease the total number of foreign visits by 140,847 people. By comparison, 
a total of 5,392,000 tourists visited Spain in 1988 alone. 


10. STRUCTURAL VARs 


Sims’ (1980) VAR approach has the desirable property that all variables are treated 
symmetrically, so that the econometrician does not rely on any “incredible identifi- 
cation restrictions.” A VAR can be quite helpful in examining the relationships 
among a set of economic variables. Moreover, the resulting estimates can be used 


for forecasting purposes. Consider a first-order VAR system of the type represented 
by (5.21): 


X = Ag tÁ X +e, 


Although the VAR approach yields only estimated values of Ap and A,, for expo- 
sition purposes, it is useful to treat each as being known. As we saw in (5.43), the 


Structural VARs 321 
n-step ahead forecast error is 
2 = i 
Xira T E Nish Seant Å lnn + Ale n-2 osrk At eni ~ (5.50) 


“ Even though the model is underidentified, an appropriately specified model will 
have forecasts that are unbiased and have minimum variance. Of course, if we had 
a priori information concerning any of the coefficients, it would be possible to im- 
prove the precision of the estimates and reduce the forecast-error variance. A re- 
searcher interested only in forecasting might want to tim down the overparameter- 
ized VAR model. Nonetheless, it should be clear that forecasting with a VAR is a 
multivariate extension of forecasting using a simple autoregression. 

The VAR approach has been criticized as being devoid of any economic content. 
The sole role of the economist is to suggest the appropriate variables to include in 
the VAR. From that point on, the procedure is almost mechanical. Since there is so 
little economic input in a VAR, it should not be surprising that there is little eco- 
nomic content in the results. Of course, innovation accounting does require an or- 
dering of the variables, but the selection of the ordering is generally ad hoc. 

Unless the underlying structural mode! can be identified from the reduced-form 
VAR model, the innovations in a Choleski decomposition do not have a direct eco- 


nomic interpretation. Reconsider the two-variable VAR of (5.19) and (5.20): 


Yi + bial = bio + Went + Nazar + Se 
bay, + 2, = bao + Y21Yi + Pe- + Ezr 


so that it is possible to write the model in the form of (5.22a) and (5.22b): 


Ye = Ayo + a1 1Yr1 + 122-1 + Cr 
Z, = Arq + Aa V1 + A222) + Ezr 


where the various a; are defined as in (5.21). For our purposes, the important point 
to note is that the two error terms e,, and e, are actually composites of the underly- 
ing shocks €,, and €,,. From (5.23) and (5.24), 


err l a 
= VW -b,b,) 
a / oad ee l Ex 


Although these composite shocks are the one-step ahead forecast errors in y, and 
z, they do not have a structural interpretation. Hence, there is an important differ- 
ence between using VARs for forecasting and using them for economic analysis. In 
(5.50), e,, and e, are forecast errors. If we are interested only in forecasting, the 
components of the forecast errors are unimportant. Given the economic model of 
(5.19) and (5.20), €, and €, are the autonomous changes in y, and z, in period ¢, re- 
spectively. If we want to obtain an impulse response function or a variance decom- 
position to trace out the effects of an innovation in y, or z, it is necessary to use the 


— i 


322 Multiequation Time-Series Models 


structural shocks (i.e., €, and €,,), not the forecast errors. The aim of a structural 
VAR is to use economic theory (rather than the Choleski decomposition) to recover 
the structural innovations from the residuals {e,,} and {e,,}. 

The Choleski decomposition actually makes a strong assumption about the 
underlying structural errors. Suppose, as in (5.32), we select an ordering such that 
b, = 0. With this assumption, the two pure innovations can be recovered as 


Ez = Cay 
and os" 
Cy = Ey — Oy ner; 


Forcing b», = 0 is equivalent to assuming that an innovation in y, does not have a 
contemporaneous effect on z, Unless there is a theoretical foundation for this as- 
sumption, the underlying shocks are improperly identified. As such, the impulse re- 
sponses and variance decompositions resulting from this improper identification 
can be quite misleading. 

If the correlation coefficient between e,, and e, is low, the ordering is not likely 
to be important. However, in a VAR with several variables, it is improbable that all 
correlations will be small. After all, in selecting the variables to include in a model, 
you are likely to choose variables that exhibit strong comovements. When the 
residuals of a VAR are correlated, it is not practical to try all alternative orderings. 
With a four-variable model, there are 24 (i.e., 4!) possible orderings. 

Sims (1986) and Bernanke (1986) propose modeling the innovations using eco- 
‘nomic analysis. To understand the procedure, it is useful to examine the relation- 
ship between the forecast errors and structural innovations in an n-variable VAR. 
Since this relationship is invariant to lag length, consider the first-order model with 
n variables: 


L ba ba we binl te bio 
by 1 b} ban |} Xz; _ by 
On bn2 53 l Xn bro 

Yu Yn Viz e Yia || Xir- Ey 

+ Yar Ya Yz e Yan || X-i + Ez; 

Yni Yn2 Yn ue Ynn Xnt-i Enr 


or in compact form, 


Bx, =+ Tix, + & 


Structural VARs 323 
Equation (5.21) is obtained by premultiplying by B”' to obtain 
x,=B'T + BT x, + Ble, 


Defining Ay = B~T, A, = B~T, and e, = B''e, yields the multivariate general- 
ization of (5.21). The problem, then, is to take the observed values of e, and restrict 
the system so as to recover €, as €, = Be, However, the selection of the various b; 
cannot be completely arbitrary. The issue is to restrict the system so as to (1) re- 
cover the various {e,,} and (2) preserve the assumed error structure concerning the 
independence of the various {€,,} shocks. To solve this identification problem, sim- 
ply count equations and unknowns. Using OLS, we can obtain the variance/covari- 
ance matrix È: 


Or Op Oin 
2 
y= Or, 9 Dan 
2 
On Onn. « OF 


where each element of È is constructed as the sum: 


, 
oy = UT). ene; 
t=] 


Since È is symmetric, it contains only (n? + n)\/2. distinct elements. There are n el- 
ements along the principal diagonal, (n — 1) along the first off-diagonal, n — 2 along 
the next off-diagonal, ..., and one corner element for a total of (n? + n)/2 free ele- 
ments. 

Given that the diagonal elements of B are all unity, B contains n? ~ n unknown 
values. In addition, there are the n unknown values var(e,,) for a total of n? un- 
known values in the structural model {i.e., the n? ~ n values of B plus the n values 
var(e,,)]. Now, the answer to the identification problem is simple; in order to iden- 
tify the n? unknowns from the known (n? + n)/2 independent elements of F, it is 
necessary to impose an additional n? — [(n? + n)/2] = (n? — n)/2 restrictions on the 
system. This result generalizes to a model with p lags: To identify the structural 
model from an estimated VAR, it is necessary to impose (n? — n)/2 restrictions on 
the structural model. 

Take a moment to count the number of restrictions in a Choleski decomposition. 
In the system above, the Choleski decomposition requires all elements above the 
principal diagonal to be zero: 


Be 


324 Multiequation Time-Series Models 


bi =), =O g = = 5, =O 
by, = by, = = by, =O 
b34= =b,, =0 

bnin =O 


Hence, there are a total of (n? — n)/2 restrictions; the system is exactly identified. 
To take a specific example, consider the following Choleski decomposition in a 
3 variable VAR: 


C1 = Ely 
Eze = C216; $ Ex 
Ezr = C3161 + C3262, + Ez; 


From the previous discussion, you should be able to demonstrate that €,,, €23, and 
€ can be identified from the estimates of e,,, €n €3 and variance/covariance ma- 
trix £. In terms of our previous notation, define matrix C = B~ with elements c; 
Hence, e, = Ce,. An alternative way to model the relationship between the forecast 
errors and the structural innovations is , 


C1, = Ele + C1 3&3, 
Ezr = C21€ + Ey 
E3, = C31 €p, + Ezr 


Notice the absence of a triangular structure. Here, the forecast error of each vari- 
able is affected by its own structural innovation and the structural innovation in one 
other variable. Given the (9 — 3)/2 = 3 restrictions on C, the necessary condition for 
the exact identification of B and €, is satisfied. However, as illustrated in the next 
section, imposing (n° — n)/2 restrictions is not a sufficient condition for exact iden- 
tification. Unfortunately, the presence of non-linearities means there are no simple 
rules that guarantee exact identification. 


11. EXAMPLES OF STRUCTURAL DECOMPOSITIONS 


To illustrate a Sims—Bernanke decomposition, suppose there are five residuals for 
ei, and e,,. Although a usable sample size of 5 is unacceptable for estimation pur- 
poses, it does allow us to do the necessary calculations in a simple fashion. Thus, 
suppose that the five error terms are 


t Cu ex 

1 1.0 0.5 
2 -05 “~ -10 
3 00o 0.0 
4 -1.0 0.5 


Examples of Structural Decompositions 325 


Since the {¢,,} and {e,,} are regression residuals, their sums are zero. It is simple 
to verify that of = 0.5, 0, = Oz; = 0.4, and o? = 0.5; hence, the variance/covariance 
matrix È is 


y 0.5 04 
[0.4 0.5 
Although the covariance between €,, and €,, is zero, the variances of €,, and €, 


are presumably unknown. Let the variance/covariance matrix of these structural 
shocks be denoted by £,, so that 


5 _ | var(e,) 0 
< 0 var(€, ) 


The reason that the covariance terms are equal to zero is that €,, and €,, are 
deemed to be pure structural shocks. Moreover, the variance of each shock is time- 
invariant. For notational convenience, the time subscript can be dropped; for exam- 
ple, var(e,,) = var(€,,.,) = + = var(é,). 

The relationship between the variance/covariance matrix of the forecast errors 
(i.e., £) and variance/covariance matrix of the pure shocks (i.e., £.) is such that £, = 
BEB’. Recall that e, and e, are the column vectors (e,,, 2,)’ and (€,,, €z), respec- 
tively. Hence, 


só that 


Dawe o e 


Similarly, £, is 


2s (1 ny €€; (5.52) 
t=1 


To link the two variance/covariance matrices, note that the relationship between 
e, and e, is such that €, = Be,. Substitute this relationship into (5.52) and recall that the 
transpose of a product is the product of the transposes {i.e., (Be,)’ = e/ B’], so that 


T 
uv PE Glee ee 


326 Multiequation Time-Series Models 
Hence, using (5.51), we obtain 
£, = BIB’ 
By using the specific numbers in the example, it follows that 
Var(e,) o J 1 ball05 0.4j| 1 d 
O — var(e)| (ba 1 J{0.4 O5}[ a. 1 

Since both sides of this equation are equivalent, they must be the same element 

by element. Carry out the indicated multiplication of BEB’ to obtain 


var(e,) = 0.5 + 0.8b,2 + 0.5b7, 53) 
0 = 0.5b,, + 0.4bz1b12 + 0.4 + 0.5b © 9° (6.54) 
0 = 0.5b,, + 0.4b,2b2, + 0.4 + 0.5b,, beg, 6S5) 
var(€,) = 0.55; + 0.852, + 0.5 (5.56) 


As you can see, Equations (5.54) and (5.55) are identical. There are three inde- 
pendent equations to solve for the four unknowns b2, b21, var(€), and var(e,). AS 
we saw in the last section, in a two-variable system, one restriction needs to be im- 
posed if the structural model is to be identified. Now consider the Choleski decom- 
position one more time. If b, = 0, we find 


Var(e,) = 0.5 
0=0.5b2, + 0.4 so that b}, = —0.8 
0 =0.5b2, + 0.4 so that again we find, b}, = —0.8 


var(€) = 0.5(b2,)* + 0.852, + 0.5 so that var(€,) = 0.5(0.64) — 0.64 + 0.5 = 0.18 
Using this decomposition, we can recover each {€,,} and {€,,} as €, = Be;: 
=e, 
and 
En = 0.8), + ez 


Thus, the identified structural shocks are 


t Ei, Er eo “Pe 
. a 10, 03 ema 

2 -0.5 -0.6 sts 

3 0.0 0.0 

4 -10 0.3 

5 0.5 0.6 


LAUHYNED Uj Ob MEd et ALLUNUD Dhs 


If you want to take the time, you can verify that var(e,) = £(€,,)?/5 = 0.5, var(e,,) 
= X(€,)°/5 = 0.18, and cov(é,,, €) = Le,,€,/5 = 0. Instead, if we impose the alterna- 
tive restriction of a Choleski decomposition and set ba, = 0, from (5.53) through 
(5.56), we obtain 


Var(e,) = 0.5 + 0.8b,, + 0.5b7, 
0=0.4+0.5b,, so that b,, =-0.8 
0=0.4+0.55,, so again b,, = —0.8 

Var(e) = 0.5 


Since b, = —0.8, var(e,) = 0.5 + 0.8(-0.8) + 0.5(0.64) = 0.18. Now, B is identi- 


fied as 
f a 
B= 
0 l 


If we use the identified values of B, the structural innovations are such that €,; = 
€i — 0.8, and €, = ea. Hence, we have the structural innovations: 


t Eir Ezr 

l 0.6 0.5 
2 0.3 -1.0 
3 0.0 0.0 
4 -0.6 -0.5 
5 -0.3 1.0 


In this example, the ordering used in the Choleski decomposition is very impor- 
tant. This should not be too surprising since the correlation coefficient between e,, 
and e, is 0.8. The point is that the ordering will have important implications for the 
resulting variance decompositions and impulse response functions. Selecting the 
first ordering (i.e., setting b,, = 0) gives more importance to innovations in e}, 
shocks. The assumed timing is such that €,, can have a contemporaneous effect on 
X,, and x2,, whereas €, shocks can affect x,, only with a one-period lag. Moreover, 
the amplitude of the impulse responses attributable to e,, shocks will be increased 
since the ordering affects the magnitude of a “typical” (i.e., one standard deviation) 
shock in €,, and decreases the magnitude of a “typical” €,, shock. 

The important point to note is that the Choleski decomposition is only one type of 
identification restriction. With three independent equations among the four un- 
knowns b;2, b21, var(€,,), and var(€,,), any other linearly independent restriction will 
allow for the identification of the structural model. Consider some of the other al- 
ternatives: 


1. A Coefficient Restriction. Suppose that we know that a one-unit innovation 
€x has a one-unit effect on x,,; hence, suppose we know that b, = 1. By using 


FNL ASL RE 


TN OMA? atest needa OS OP, 


328 Multiequation Time-Series Models 
the other three independent equations, it follows that var(e,,) = 1.8, ba = =1, 


var(€,,) = 0.2. 
Given that €, = Be,, we obtain 


Eir 1 lille, 

Ez; j- 1 >, 
so that €,, = ej, + €y and €, = —e,, + ea If we use the five hypothetical regres- 
‘sion residuals, the décomposed innovations become: oe 


t €; Ezr 
1 1.5 -0.5 
2 -1.5 -0.5 l 
3 0 0 - ‘ 
a 4 -1.5 TaN 0.5 
. 5 


1.5 0.5 


2. A Variance Restriction. Given the relationship between £, and = (ie., £, = 
‘BXB’), a restriction on the variances contained within £. will always imply mul- 
tiple solutions for coefficients of B. To keep the arithmetic simple, suppose that 
we know var(€,,) = 1.8. The first equation yields two possible solutions for b,, = 
1 and b,, = —2.6; unless we have a theoretical reason to discard one of these 
magnitudes, there are two solutions to the model. Thus, even in a simple 
2-variable case, unique identification is not always possible. If b,. = 1, the re- 

_ maining solutions are b,, = —1 and var(e,,) = 0.2. If b,, = —2.6, the solutions are 
ba, =-—1° 3 and var(e,,) = 0.556. 
The two solutions can be used to identify two different {¢,,} and {€,,} se- 
quences and innovation accounting can be performed using both solutions. Even 
though there are two solutions, both satisfy the theoretical restriction concerning 


var(€,,). l 


3. Symmetry Restrictions. A linear combination of the coefficients and variances 
can be used for identification purposes. For example, the symmetry restriction 
b,a = by, can be used for identification. If we use Equation (5.54), there are two 

` solutions: b, = b3, = —0.5 or b,2 = bz, = —2.0. For the first solution, var(e,,) = 

“ 0.225, and using the second solution, we get var(e,,) = 0.9.'° 

i Nevertheless, for the first solution, 


Pra Ik] 
ex] [05 1 Jlex 


Examples of Structural Decompositions 


so that 
t Elir AREE Ezr 
renner 

i 0.75 0 

2 0 ~0.75 
3 0 0 

4 ~0.75 0 

5 0 0.75 


Overidentified Systems 


. 329 


It may be that economic theory suggests more than (n? — n)/2 restrictions. If SO, it is 
necessary to modify the method above. The procedure for identifying an overidenti- 


fied system entails the following steps: 


STEP 1: 
cients. Hence, estimate the unrestricted VAR: X,=Ag + Aix, ++ 


The restrictions on B or var(€,,) do not affect the estimation of VAR coeffi- 
“+A Pip 


+ e, Use the standard lag length and block causality tests to help deter- 


mine the form of the VAR. 


STEP 2: Obtain the unrestricted variance/covariance matrix £. The determinant of 


this matrix is an indicator of the overall fit of the model. 


STEP 3: Restricting B and/or £, will affect the estimate of X. Select the appropriate 
restrictions and maximize the likelihood function with respect to the free 
parameters of B and £.. This will lead to an estimate of the restricted vari- 


ance/covariance matrix. Denote this second estimate by Xp. 


For those wanting a more technical explanation, note that the log likeli- 


hood function is 


-(T/2) infE|~(1/2) Ee) TERN 


t=] 


Fix each element of e, (and e’) at the level obtained using OLS; call 
these estimated OLS residuals é Now use the relationship £, = BIB’ so 


that the log likelihood function can be written as 


A 
-(7/2)in| BE BY! |-(1/2) ¥° @/B’Ez'Bé, ) 
t=] 


et 


330 Multiequation Time-Series Models 


Now select the restrictions on B and £, and maximize with respect to 
the remaining free elements of these two matrices. The resulting estimates 
of B and È, imply a value of È that we have dubbed È}. 


STEP 4: If the restrictions are not binding, £ and Lp will be equivalent. Let R = the 
number of overidentifying restrictions; that is, R = number of restrictions 
exceeding (n? — n)/2. Then, the x? test statistic: 


y7= lEe] - [zl 


with R degrees of freedom can be used to test the restricted system.'' If the 
calculated value of x? exceeds that in a x? table, the restrictions can be 
rejected. Now allow for two sets of overidentifying restrictions such that 
< the number of restrictions in R, exceeds that in R,. In fact, if R, > R; 2 
(n? — n)/2, the significance of the extra R, — R, restrictions can be tested as 


X= log.) - lEz | with R, — R, degrees of freedom 

Similarly, in an overidentified system, the t-statistics for the individual 
coefficients can be obtained. Sims warns that the calculated standard er- 
rors may not be very accurate. 


Sims’ Structural VAR 


Sims (1986) uses a six-variable VAR of quarterly data over the period 1948:} to 
1979:3. The variables included in the study are real GNP (y), real business fixed in- 
vestment (i), the GNP deflator (p), the money supply as measured by M1 (n), un- 
employment (u), and the treasury bill rate (r). An unrestricted VAR was estimated 
with four lags of each variable and a constant term. Sims obtained the 36 impulse 
response functions using a Choleski decomposition with the ordering y>i—> p> 
m — u — r. Some of the impulse response functions had reasonable interpretations. 
However, the response of real variables to a money supply shock seemed unreason- 
able. The impulse responses suggested that a money supply shock had little effect 
on prices, output, or the interest rate. Given a standard money demand function, it 
is hard to explain why the public would be willing to hold the expanded money 
supply. Sims’ proposes an alternative to the Choleski decomposition that is consis- 
tent with money market equilibrium. Sims restricts the B matrix such that 


1 b, 0 0 0 OF fr] fer] 
by, 1 by by O 0 ™ | | Emt 
by 0. Ae, D MOE e Aa 
by, 0 ba 1 “| |o] |en 
Bs, O bg bsy l bs] |i Eur 
o 0o 0o 0o 0o ijl] Le] 


SNRs Ns 


haley 


ihe Blancnara ana Yuan Vecomposition SSL 


Notice that there are 17 zero restrictions on the b; the system is overidentified: 
with six variables, exact identification requires only (6? — 6)/2 = 15 restrictions. 
Imposing these 16 restrictions, Sims’ identifies the following six relationships 
among the contemporaneous innovations: 


r,=71.20m, + €, (5.57) 
m, = 0.283y, + 0.224p, — 0.00817, + €, (5.58) 
y, = -0.00135r, + 0.132i, + €, (5.59) 
p, =—0.0010r, + 0.045y, — 0.00364i, + Ep (5.60) 
u, = -0.1 16r, — 20.1y, — 1.48i, — 8.98p, + €,, (5.61) 
fae tee (5.62) 


Sims views (5.57) and (5.58) as money supply and demand functions, respec- 
tively. In (5.57), the money supply rises as the interest rate increases. The demand 
for money in (5.58) is positively related to income and the price level and nega- 
tively related to the interest rate. Investment innovations in (5.62) are completely 
autonomous. Otherwise, Sims sees no reason to restrict the other equations at any 
particular fashion. For simplicity, he chooses a Choleski-type block structure for 
GNP, the price level, and the unemployment rate. The impulse response functions 
appear to be consistent with the notion that money supply shocks affect prices, in- 
come, and the interest rate. 


12. THE BLANCHARD AND QUAH DECOMPOSITION 


Blanchard and Quah (1989) provide an alternative way to obtain a structural identi- 
fication. Their aim is to reconsider the Beveridge and Nelson (1981) decomposition 
of real GNP into its temporary and permanent components. Toward this end, they 
develop a macroeconomic model such that real GNP is affected by demand-side 
and supply-side disturbances. In accord with the natural rate hypothesis, demand- 
side disturbances have no long-run affect on real GNP. On the supply side, produc- 
tivity shocks are assumed to have permanent affects on output. In.a univariate 
model, there is no unique way to decompose a variable into its temporary and per- 
manent components. However, using a bivariate VAR, Blanchard and Quah show 
how to decompose real GNP and recover the two pure shocks. 

To take a general example, suppose we are interested in decomposing an /(1) se- 
quence—say, (y,}—into its temporary and permanent components. In a univariate 
framework [recall the discussion conceming Beveridge and Nelson (1981)], there is 
no unique way to perform the decomposition. However, let there be a second vari- 
able {z,} that is affected by the same two shocks. For the time being, suppose that 
{z,} is stationary. If we ignore the intercept terms, the bivariate moving average 


(BMA) representation of the {y,} and {z,} sequences will have the form: 


See gare oe 


332 Multiequation Time-Series Models 


Ay, = > Ci (k) € jpg + $. C12 (k) Eik (aeiio 7 $ (5.63) 
k=0 k=0 
Ze » Ca (k) Epp + >, Co (k) zrk (5.64) 
k=0 ; k=0 
or in a more compact form, 


a [4y] _| Ci) a a > 
eL Ca (L) C(L) j €z 
where €, and e, = independent white-noise disturbances, each having a constant 
variance 


and the C,(L) are polynomials in the lag operator L such that the individual coeffi- 
cients of C(L) are denoted by c,(k). For example, the third coefficient of C,,(L) is 
€2,(3). For convenience, the time subscripts on the variances and covariance terms 
are dropped and shocks normalized so that var(e,) = 1 and var(e,) = 1. If we call £, 
the variance/covariance matrix of the innovations, it follows that 


cov(ej, €))  var(e,) 


-| var(€é,)  cov(e,, i 4 EOE 


In order to use the Blanchard and Quah technique, both variables must be in a 
stationary form. Since {y,} is (1), (5.63) uses the first difference of the series. Note 
that (5.64) implies that the {z,} sequence is /(0); if in your own work you find that 
{z,} is also 1(1), use its first difference. 

In contrast to the Sims~Bernanke procedure, Blanchard and Quah do not directly 
associate the {e,,} and {€} shocks with the {y,} and {z,} sequences. Instead, the 
{y,} and {z,} sequences are the endogenous variables, and the {€,,} and {€,,} se- 
quences represent what an economic theorist would call the exogenous variables. In 
their example, y, is the logarithm of real GNP, z, unemployment, €,, an aggregate 
demand shock, and e, an aggregate supply shock. The coefficients of C,,(L), for 
example, represent the impulse responses of an aggregate demand shock on the 
time path of change in the log of real GNP.'? 

The key to decomposing the {y,} sequence into its trend and irregular compo- 
nents is to assume that one of the shocks has a temporary effect on the {y,} se- 


The Blanchard and Quah Decomposition ` 333 


VAR. For example, Blanchard and Quah assume that an aggregate demand shock 
has no long-run effect on real GNP. In the long run, if real GNP is to be unaffected 
by the demand shock, it must be the case that the cumulated effect of an €,, shock 
on the Ay, sequence must be equal to zero. Hence, the coefficients c, ,(k) in (5.63) 
must be such that 


œ% 


Y enker =Q tty Rs (5.65) 
k=0 BoM erg 
Since the demand-side and supply-side shocks are not observed, the problem is 


to recover them from a VAR estimation. Given that the variables are stationary, we 
know there exists a VAR representation of the form: 


= Bane pe i P 
‘ - Zi Ax (L) An (L)|| z ez, 


or to use a more COMpact notation, 


(5.66) 


x, =A(L)x,_, +e, 


where x, = the column vector (Ay, z,)’ 


e, the column vector (ein ez) 
A(L) = the 2 x 2 matrix with elements equal to the polynomials AL) 


and the coefficients of A,(L) are denoted by a,(k).'3 

The critical insight is that the VAR residuals are composites of the pure innova- 
tions €,, and €,,. For example, e,, is the one-step ahead forecast error of y,; that is, 
e,, = Ay, — E,_,Ay,. From the BMA, the one-step ahead forecast error is c,;(O)é,, + 
C12(O)€,,. Since the two representations are equivalent, it must be the case that 


C1, = C1 (0E + Cy (Dep, (5.67) 
Similarly, since e,, is the one-step ahead forecast error of z, 
Cr, = Cai (NE, + Caa (0e © (5.68) 


or, combining (5.67) and (5.68), we get 


eee a [e] 
En ca (0) C49 (0) | | Ez 


334 Multiequation Time-Series Models 


relationship between (5.66) and the BMA model plus the long-run restriction of 
(5.65) provide exactly four restrictions that can be used to identify these four coef- 
ficients. The VAR residuals can be used to construct estimates of var(e,), var(e,) 
and cov(e;, €,).'* Hence, there are the following three restrictions: 


RESTRICTION 1 


Given (5.67) and noting that Ee,,¢,, = 0, we see that the normalization var(e,) = 
var(€,) = | means that the variance of e,, is 


Var(e,) = c0)? + c0 (5.69) 
RESTRICTION 2 
Similarly, if we use (5.68), the variance of e, is related to c2,(0) and c32(0) 
as 
Var(e2) = C21(0)? + c20) (5.70) 
RESTRICTION 3 


The product of e,, and ez is 
C1 = [er O)ey, + C120) Ep ][C21(0)€,, + C22(0)€,,] 
If we take the expectation, the covariance of the VAR residuals is 
Ee £2, = C11 (0)c21(0) + ¢2(0)c22(0) (5.71) 
Thus, equations (5.69), (5.70), and (5.71) can be viewed as three equations in the 
four unknowns c,,(0), ¢,2(0), C2,(0), and c2.(0). The fourth restriction is embedded 
in assumption that the {e,,} has no long-run effect on the {y,} sequence. The prob- 
lem is to transform the restriction (5.65) into its VAR representation. Since the al- 
gebra is a bit messy, it is helpful rewrite (5.66) as 
x, = A(L)Lx, + €, 
so that 
[7 — A(L)L)]x, = e, 
and by premultiplying by [Z ~ A(L)L]"', we obtain 


x,=(1-A(L)L)V'e, (5.72) 


The Blanchard and Quah Decomposition 335 


Denote the determinant of [7 — A(L)L] by the expression D. It should not take too 
long to convince yourself that (5.72) can be written as: 


bee = (I/D) l-A y(L)L | Ay (L)L ey 

z Ana (LL 1-A DL] e 

or using the definitions of the A, (L), we get 

A k+l k+l 

| ” |- (o) l Zang WL Zap (OL E Eyr 
Z; Lay (AL 1-Za,,(k)L*" | | ez 


where the summations run from & = 0 to infinity. 
Thus, the solution for Ay, in terms of the current and lagged values of {e,,} and 
{ez} is 


Ay, = (1/D) f: -$ ano" h + Zaoa (5.73) 


k=0 k=0 


Now, e, and e, can be replaced by (5.67) and (5.68). If we make these substitu- 


-> tions, the restriction that the {€,,} sequence has no long-run effect on y, is 


i ~ Ženo h (Ojej + Daa ey, (O)e,, =0 


k=0 k=0 


RESTRICTION 4 


For all possible realizations of the {€,,} sequence, €,, shocks will have only 
temporary effects on the Ay, sequence (and y, itself) if 


h-Zaw 0» Fondren =0 


k=0 k=0 


With this fourth restriction, there are four equations that can be used to identify 
the unknown values c,,(0), c12(0), ¢2,(0), and c.,(0). To summarize, the steps in the 
procedure are as follows. 


STEP 1: Begin by pretesting the two variables for time trends and unit roots. If {y,} 
does not have a unit root, there is no reason to proceed with the decompo- 
sition. Appropriately transform the two variables, so that the resulting se- 
quences are both 1(0). Perform lag-length tests to find a reasonable ap- 
proximation to the infinite-order VAR. The residuals of the estimated 


336 Multiequation Time-Series Models 


VAR should pass the standard diagnostic checks for white-noise processes 
(of course, e,, and ez can be correlated with each other). 


STEP 2: Using the residuals of the estimated VAR, calculate the variance/covari- 
ance matrix; that is, calculate var(e,), var(e,), and cov(e,, e2). Also calcu- 
late the sums: 


p 


X anlk) 


p 
l 1- X an (k) and 
k=0 k=0 
where p= lag length used to estimate the VAR 
Use these values to solve the following four equations for ¢, 10}, c0), 
“7 ¢5,(0), and c2,(0): uo E 


Var(e,) = c0) + ¢(0) 

Var(e,) = ca, (0) + c22(0)7 

Cov(e;, €2) = C11(O)c2(0) + c12(0)c22(0) 
0 = ¢,,(O)[1 — Lago(k)] + €21(0)Zay2(k) 


Given these four values c,(0) and the residuals of the VAR {e,,} and 
{ex}, the entire {e,,} and {€,,} sequences can be identified using the for- 
mulas:!> 


ezi = Cr (OE yg + Cp2(D) 1 


and 


Cri = Cp (Oey + C22 DE, 


STEP 3: As in a traditional VAR, the identified {e,,} and {€,} sequences can be 
used to obtain impulse response functions and variance decompositions. 
The difference is that the interpretation of the impulses is straightforward. 
For example, Blanchard and Quah are able to obtain the impulse responses 
of the change in the log of real GNP to a typical supply-side shock. 
Moreover, it is possible to obtain the historical decomposition of each se- 
ries. For example, set all {€,,} shocks equal to zero and use the actual {¢,,} 
series (i.e., use the identified values of €,) to obtain the permanent 
changes in {y,} as'® 


- Rags 2 
oo 


Ay, = Sic eek 


k=0 


— 


9 


The Blanchard and Quah Decomposition 337 


The Blanchard and Quah Results 


In their study, Blanchard and Quah (1989) use the first difference of the logarithm 


of real GNP and the level of unemployment. They note that unemployment exhibits 
an apparent time trend and that there is a slowdown in real growth beginning in the 
mid-1970s. Since there is no obvious way to address these difficult issues, they es- 


~ timate four different VARs. Two include a dummy allowing for the change in the 


rate of growth in output and two include a deterministic time trend in unemploy- 
ment. Using quarterly GNP and unemployment data over the period 1950:2 through 
1987:4, they estimated a VAR with eight lags. 

Imposing the restriction that demand-side shocks have no long-run effect on real 
GNP, Blanchard and Quah identify the two types of shocks. The impulse response 
functions for the four VARs are quite similar: 


. The time paths of demand-side disturbances on output and unemployment are 
hump-shaped. The impulse responses are mirror images of each other; initially 
output increases while unemployment decreases. The effects peak after four 
quarters; afterward they converge to their original levels. 


Supply-side disturbances have a cumulative effect on output. A supply distur- 
bance having a positive effect on output also has a small positive initial effect on 
unemployment. After this initial increase, unemployment steadily decreases and 
the cumulated change becomes negative after four quarters. Unemployment re- 
mains below its long-run level for nearly 5 years. 


Blanchard and Quah find that the alternative methods of treating the slowdown 
in output growth and the trend in unemployment affect the variance decomposi- 
tions. Since the goal here is to illustrate the technique, consider only the variance 
decomposition using a dummy variable for the decline in output growth and de- 
trended unemployment. 


Percent of Forecast Error Variance due to Demand-Side Shocks 


Forecasting 
Horizon (Quarters) Output Unemployment 
l 99.0 51.9 
4 Aneel 80.2 
12 67.6 86.2 
40 39.3 85.6 


At short-run horizons, the huge preponderance of the variation in output is due to 
demand-side innovations. Demand shocks account for almost all the movement in 
GNP at short horizons. Since demand shock effects are necessarily temporary, the 
findings contradict those of Beveridge and Nelson. The proportion of the forecast 


338 Multiequation Time-Series Models 


error variance falls steadily as the forecast horizon increases; the proportion con- 
verges to zero since these effects are temporary. Consequently, the contribution of 
supply-side innovations to real GNP movements increases at longer forecasting 
horizons. On the other hand, demand-side shocks generally account for increasing 
proportions of the variation in unemployment at longer forecasting horizons. 


13. DECOMPOSING REAL AND NOMINAL EXCHANGE 
RATE MOVEMENTS: AN EXAMPLE 


In Lee and Enders (1993), we decompose real and nominal exchange rate move- 
ments into the components induced by real and nominal factors. This section pre- 
sents a small portion of the paper in order to further illustrate the methodology of 
the Blanchard and Quah technique. One aim of the study is to explain the devia- 
tions from purchasing power parity. As in Chapter 4, the real exchange rate (r,) can 
be defined as!” 


r=, + Di — Pp, 


where p* and p, refer to the logarithms of U.S. and Canadian wholesale price in- 
dices and e, is the logarithm of the Canadian dollar/U.S. dollar nominal exchange 
rate. 

To explain the deviations from PPP, we suppose there are two types of shocks: a 
real shock and nominal shock. The theory suggests that real shocks can cause per- 
manent changes in the real exchange rate, but nominal shocks can cause only tem- 
porary movements in the real rate. For example, in the long run, if Canada doubles 
its nominal money supply, the Canadian price level will double and the Canadian 
dollar price of U.S. dollars will halve. Hence, in the long run, the real exchange rate 
remains invariant to a money supply shock. 

For Step 1, we perform various unit root tests on the monthly Canadian/U.S. dol- 
lar real and nominal exchange rates over the 1973:1 to 1989:12 period. Consistent 
with other studies focusing on the post-Bretton Woods period, it is clear that real 
and nominal rates can be characterized by non-stationary processes. We use the 
first difference of the logarithm of each in the decomposition. Our BMA model has 


the form: 

An) [CD CD) H 

Ae, Cy (LZ) Cy (ZL) } | Ene 
where ¢,, and €,, represent the zero-mean mutually uncorrelated real and nominal 
shocks, respectively. 


The restriction that the nominal shocks have no long-run effect on the real ex- 
change rate is represented by the restriction that the coefficients in C,,(L) sum to 


Decomposing Real and Nominal Exchange Rate Movements: An Example 339 


zero; thus, if c,(k) is the kth coefficient in C,,(L), as in (5.65), the restriction is 


oo 


$ enk) =0 ~ f (5.74) 


k=0 


The restriction in (5.74) implies that the cumulative effect of e on Ar, is zero, 
and consequently, the long-run effect of € on the level of r, itself is zero. Put an- 
other way, the nominal shock e,, has only short-run effects on the real exchange 
rate. Note that there is no restriction on the effects of a real shock on the real rate or 
on the effects of either real or nominal shocks on the nominal exchange rate. 

For Step 2, we estimate a bivariate VAR model for several lag lengths. At con- 
ventional significance levels, formal tests indicate that one lag is sufficient. 
However, to avoid the possibility of omitting important effects at longer lags, we 
performed the entire analysis using lag lengths of 1 month, 6 months, and 12 
months. 

The variance decomposition using the actual {¢,,} and {€„} sequences allows us 
to assess the relative contributions of the real and nominal shocks to forecast error 
variance of the real and nominal exchange rate series. 


Percent of Forecast Error Variance Accounted for by Real Shocks 


Horizon Ar, Ae, 
1 month 100% 81.5% 
3 months 99.9 l 79.2 
12 months 005S 98.5 78.1 
36 months 98.5 78.1 


As is immediately evident, real shocks explain almost all the forecast error vari- 
ance of the real exchange rate at any forecast horizon. Nominal shocks accounted 
for approximately 20% of the forecast error variance of the nominal exchange rate. 
Our interpretation is that real shocks are responsible for movements in real and 
nominal exchange rates. Hence, we should expect them to display sizable comove- 


-© ments. 


Figure 5.8 shows the impulse response functions of the real and nominal ex- 


» change rates to both types of shocks. For clarity, the results are shown for the levels 


of exchange rates (as opposed to first differences) measured in terms of standard 
deviations. For real shocks: 


1. The effect of a “real” shock is to cause an immediate increase in the real and 
nominal exchange rate. It is interesting to note that the jump in the real value of 
the dollar is nearly the same as that of the nominal dollar. Moreover, these 
changes are all of a permanent nature. Real and nominal rates converge to their 
new long-run levels in about 9 months. 


seine ren ieee et 


340 Multiequation Time-Series Models Decomposing Real and Nominal Exchange Rate Movements: An Example 341 


Figure 5.8 initially moves in the same direction as the U.S. nominal dollar. 

It is instructive to examine the hypothetical time paths of the nominal rate that 
result from the decomposition. Normalize both rates such that January 1973 = 1.0. 
Figure 5.9 shows that if all shocks had been nominal shocks, the Canadian dollar 
would have declined (i.e., the U.S. dollar would have appreciated) rather steadily 
throughout the entire period; it appears that the rate of depreciation would have ac- 
celerated beginning in the early 1980s and continuing throughout 1989. The role of 
the “real” shock was generally reinforcing that of the nominal shock. It is particu- 
larly interesting to note that the real shock captures the major turning points of ac- 
tual rates. The sharp depreciation beginning in 1978 and the sharp appreciation be- 
ginning around 1986 are the result of real, as opposed to nominal, factors. 


Response of real exchange rate. 


1.5 


0.5 


=0.5 be aan Limitations of the Technique 
) 2 4 6 8 10 12 14 16 18 20 Æ 1 
Quarters le 


— Realshock -+ Nominal shock 


A problem with this type of decomposition is that there are many types of shocks. 
As recognized by Blanchard and Quah (1989), the approach is limited by its ability 
to identify at most only as many types of distinct shocks as there are variables. 


Response of nominal exchange rate. 


Figure 5.9 Decomposed real canadian dollars. 
140 


130 


120 


2 4 6 8 10 12 14 16 18 20 22 24 
Quarters 
— Realshock —+- Nominal shock 


2. The movement in the real rate to its long-run level is almost immediate, whereas 
the nominal value of the U.S. dollar generally rises over time (i.e., the U.S. dol- 
lar price of the Canadian dollar falls). There is little evidence of exchange rate 
overshooting. 


110 


3. Long-run changes in the two rates are almost identical, but surprisingly, the 
long-run real rate jumps more rapidly than the nominal rate. 


As required by our identification restriction, the effect of a nominal shock on the 
real exchange rate is necessarily temporary. Notice that the effects of typical “nom- 
inal” shocks of one standard deviation are all significantly smaller than the effects 
of typical “real” shocks. A typical nominal shock causes a rise in the nominal value 
of the U.S. dollar with no evidence of overshooting. Finally, the real U.S. dollar 


— fi — DOSO AON HOU COS EES S SOOO GS i 
ze 90 3n.1975 Jan.197%:Jan.1979 Jan.1981 Jan.1983 Jan.1985 Jan.1987 Jan.1989 


— Actual --- Real — AInminal 


ates Se uy ne ee eS 


SOSA: 


ced 
mS 


a 


i 
he, 
h 
a 


| 

t 
ni 
K 


, 
ii 


pee: 


Blanchard and Quah prove several propositions that are somewhat helpful when the 
presence of three or more structural shocks is suspected. Suppose that there are sev- 
eral disturbances having permanent effects, but only one having a temporary effect 
on {y,}. If the variance of one type of permanent disturbance grows “arbitrarily” 
small relative to the other, then the decomposition scheme approaches the correct 
decomposition. The second proposition they prove is that if there are multiple per- 
manent disturbances (temporary disturbances), the correct decomposition is possi- 
ble if and only if the individual distributed lag responses in the real and nominal ex- 
change rate are sufficiently similar across equations, By “sufficiently similar,” 
Blanchard and Quah mean that the coefficients may differ up to a scalar lag distrib- 
ution. However, both propositions essentially imply that there are only two types of 
disturbances, For the first proposition, the third disturbance must be arbitrarily 
small. For the second proposition, the third disturbance must have a sufficiently 
similar path as one of the others. It is wise to avoid such a decomposition when the 
presence of three or more important disturbances is suspected. 


SUMMARY AND CONCLUSIONS 


Intervention analysis was used to determine the effects of installing metal detectors 
in airports. More generally, intervention analysis can be used to ascertain how any 
deterministic function affects an economic time series. Usually, the shape of the in- 
tervention function is clear as in the metal detector example. However, there is a 
wide variety of possible intervention functions. If there is an ambiguity, the shape 
of the intervention function‘can.t..determined using the standard Box—Jenkins cri- 


teria for model selection. The crucial assumption in intervention analysis is that the 


intervention function has only deterministic components, 

Transfer function analysis is appropriate if the “intervention” sequence is sto- 
chastic. If {y,} is endogenous and (z) exogenous, a transfer function can be fit us- 
ing a five-step procedure discussed in Section 2. The procedure is a straightforward 
modification of the standard Box-Jenkins methodology. The resulting impulse re- 
sponse function traces out the time path of (z,} realizations on the {y,} sequence. 
The technique was illustrated by a study showing that terrorist attacks caused 
Italy’s tourism rg77ees to decline by a total 600 milligg ssa 

With econo” Cata. fine prev 
gjde penden . sates i 
others arg e inapprop™ 


tion anay. ane endogenous 
jables as JO izations O 

varia S ad the past realiZal 

yzations 3 id to pars 


special attention pa ee 00 
cation restrictions n 
tests, block exogeneity, 


ficulty with VAR analysis is that the underlying structural model cannot be recov- 
ered from estimated VAR. An arbitrary Choleski decomposition provides an extra 
equation necessary for identification of the structural model. For each variable in 
the system, innovation accounting techniques can be used to ascertain (1) the per- 
centage of the forecast error variance attributable to each of the other variables and 
(2) the impulse responses to the various innovations. The technique was illustrated 
by examining the relationship between terrorism and tourism in Spain. 

Another difficulty of VAR analysis is that the system of equations is overpara- 
meterized. The Bayesian approach combines a set of prior beliefs with the tradi- 
tional VAR methods presented in the text. West and Harrison (1989) provides an 
approachable introduction to the Bayesian approach, Litterman (1981) proposed a 
sensible set of Bayesian priors that have become the standard in Bayesian VAR 
models. Todd (1984) and Leamer (1986) provide very accessible applications of the 
Bayesian approach. 

An important development is the convergence of traditional economic theory and 
the VAR framework. Structural VARs impose an economic model on the contem- 
poraneous movements of the variables. As such, they allow for the identification of 
the parameters of the economic model and the structural stocks. The 
Bernanke-Sims procedure can be used to identify (or overidentify) the structural 
innovations. The Blanchard and Quah methodology imposes long-run restrictions 
on the impulse response functions to exactly identify the structural innovations. An 
especially useful feature of the technique is that it provides a unique decomposition 
of an economic time series into its temporary and permanent components. 


QUESTIONS AND EXERCISES 

1. Consider three forms of the intervention variable: 
__ Pulse: z, = 1 and all other z; = 0 
Prolonged impu Serre Al i a a 
z,=0 -othenwalues of 


A. Show how each of the following {y,} sequences responds to the three ty 
Airon Aac pes 
a1 ALQQS 


Hach tamara mance 7? 


SEN SAR RRL RSET Te tet of MN ee a 


Tae. 


EN pe 


344 


2. 


Multiequation Time-Series Models 


C. Show that an intervention variable will not have a permanent effect on a 
unit root process if all values of z; sum to zero. 
D. Discuss the plausible models you might choose if the {y,} sequence is 


i. Stationary and you suspect that the intervention has a permanent effect 
on Ey,. 


. li. Stationary and you suspect that the intervention has a growing and then 
a diminishing effect. 


iii. Nonstationary and you suspect that the intervention has a permanent ef- 
fect on the level of {y,}. 


iv. Nonstationary and you suspect that the intervention has a temporary ef- 
fect on the level of the (y,}. 


_. v. Nonstationary and you suspect that the intervention increases the trend 
growth of {y,}. 


Let the realized value of the {z,} sequence be such that z; = 1 and all other val- 
ues of z,= 0. 


A. Use Equation (5.11) to trace out the effects of the {z,} sequence on the time 
path of y,. 


B. Use Equation (5.12) to trace out the effects of the {z,} sequence on the time 
paths of y, and Ay,. 


C. Use Equation (5.13) to trace out the effects of the {z,} sequence on the time 
paths of y, and Ay,. 


D. Would your answers to parts A through C change if (z,} was assumed to be 
a white-noise process and you were asked to trace out the effects of a z, 
shock of the various {y,} sequences? 

E. Assume that {z,} is a white-noise process with a variance equal to unity. 

“a i, Use (5.11) to derive the cross-correlogram between {z,} and {y,}. 

ii. Use (5.12) to derive the cross-correlogram between {z,} and {Ay,}. 
iii. Use (5.13) to derive the cross-correlogram between {z,} and {Ay,}. 


iv. Now suppose that z, is the random walk process z, = z,_, + €, Trace out 
the effects of an e, shock on the Ay, sequence. 


. Consider the transfer function model y, = 0.5y,_, + z + €, where z, is the au- 


toregressive process z, = 0.5z,_1 + €,. 


‘A. Derive the CACF between the filtered {y,} sequence and {€,,} sequence. 


B. Now suppose y, = 0.5y,_; + z, + 0.5z,_, + € and z, = 0.5z,_; + €, Derive the 
cross-autocovariances between the filtered {y,} sequence and e,. Show that 
the first two cross-autocovariances are proportional to the transfer function 

coefficients. Show that the cross-covariances decay at the rate 0.5. 


WAL SO UE Lee eae wie 


4. Use (5.28) to find the appropriate second-order stochastic difference equation 


for y: 


yj 0.8 0.2] y a ey, 
Z, 0.2 0.84) z ez, 


A. Determine whether the {y,} sequence is stationary. 


B. Discuss the shape of the impulse response function of y, to a one-unit t shock 
in e,, and a one-unit shock in e,,. 


C. Suppose ei, = €, + 0.56, and that e, = €,. Discuss the shape of the impulse 
response function of y, to a one-unit shock in ¢,,. Repeat for a one-unit 
shock in €, 


D. Suppose e,, = €,, and that e,, = 0.56, + €,,. Discuss the shape of the impulse 
response function of y, to a one unit shock in €. Repeat for a one-unit 
shock in €,,. 


E. Use your answers to C and D to explain why the ordering in a Choleski de- 
composition is important. 


F. Using the notation in (5.21), find Af and Aj}. Does At appear to approach 
zero (i.e., the null matrix)? 


. Using the notation of (5.21) suppose aio = 0, a2 = 0, a, = 0.8, a). = 0.2, ay, = 


0.4, and a,,=0.1. 


A. Find the appropriate second-order stochastic difference equation for y, 
Determine whether the {y,} sequence is stationary. 


B. Answer parts B through F of Question 4 using these new values of a;,. 


C. How would the solution for y, change if ay) = 0.2? 


. Suppose the residuals of a VAR are such that var(e,) = 0.75, var(e,) = 0.5, and 


coves, ez) = 0.25. 


A. Using (5.53) through (5.56) as guides, show that it is not possible to iden- 
tify the structural VAR. 


w 
= 


Jsing Choleski decomposition such that b,, = 0, find the identified values 
f ba, var(e,), and var(e,). 


© 


C. Using Choleski decomposition such that 6, = 0, find the identified values 
of biz, var(e,), and var(e,). 


D. Using a Sims—Bernanke decomposition such that b,x = 0.5, find the identi- 
fied values of b,, var(e,), and Var(e). 


E. Using a Sims-Bemanke decomposition such that bz, = 0.5, find the identi- 
fied values of b,,, var(e,), and var(e,). 


F. Suppose that the first three values of e,, are estimated to be 1, 0, and —-1 and 
the first three values of e,, are estimated to be —1, 0, and 1, Find the first 


three values of €,, and e, using each of the decompositions in parts B 
through E. 


7. This set of exercises uses data from the file entitled US.WK1. The first column 
contains the U.S. money supply (as measured by M1) and fifth column the 
U.S. GDP Deflator (1985 = 100) for the period 1960:Q1 through 1991:Q4. 
These two variables are labeled M1 and GDPDEEF on the data disk. In 

_ Questions 7 through 10, your task is to uncover the relationship between the in- 

l _ flation rate and rate of growth of the money supply. 

Economic theory suggests that many variables influence inflation and 
money growth. Some of these variables are included in the file US.WK1. 
Respectively, columns 2, 3, and 4 hold the Treasury bill rate (denoted by 
TBILL), 3-year government bond yield (denoted by R3), and 10-year govern- 
ment bond yield (denoted by R10). Column 6 contains real GDP in 1985 prices 
(denoted by GDP85) and column 7 nominal government purchases (denoted by 
GOVT). To keep the issues as simple as possible, consider only a bivariate 

.. WAR between money and inflation. 


= A. Construct the rate of growth of the money supply (GM1) and inflation rate 
(INF) as the following logarithmic changes: 


GM1, = log(M1,) — log(M1,_,) 
INF, = log(GDPDEF,) — log(GDPDEF,_,) 


You should find that the constructed variables have the following properties: 


Observa- Standard 
Series tions Mean Error Minimum Maximum 
INF 127 0.0119070404  0.0066458391 —0.0039847906 0.0296770174 
GMI 127 0.0149101522 0.0295263232 —0.0471790362 0.0781839833 


B. The bivariate VAR might have the form given by (5.44). One problem 
with this specification is that GM1, has a strong seasonal component. In 
Exercise 5 of Chapter 2, you were asked to model the {M1} series using 
univariate methods. Recall that seasonal differencing was necessary. In 
VAR analysis, it is common practice to include seasonal dummy vari- 
ables to capture the seasonality. Construct the dummy variables D,, D, 
and D, 


where D;,= 1 in the ith quarter of each year and zero otherwise. 


Interpret the effects of the seasonal dummies in the following bivariate VAR: 


GM1,=Ajo + Ajo LD, + Ajo(2)D2 + Ajo(3)D3 +A (LGM 1, 
+A (DINF i te, 

INF, = Ang + Aao(1)D, + Azo(2)D> + Ar9(3)D3 + Aa (LIGM1,_, 
+A (DINF i + €2, 


C. Consider the bivariate VAR above using 12 lags of each variable and save 
the residuals. 


i. Explain why the estimation cannot begin earlier than 1963:Q2. 


li. Estimate the model (with the seasonal dummies) using 12 lags of each 
variable and save the residuals. You should find that log( | Lio l) z 
-20.56126 


ili. Estimate the same model over the same sample period now using only 
using eight lags of each variable. You should find log( | Ze l) = 
-20.42120 


iv. Use (5.45) to construct the likelihood ratio test for the null hypothesis 
of eight lags. How many restrictions are there in the system? How 
many regressors are there in each of the unrestricted equations? If you 
answer correctly, you should find that the calculated value x? with 16 
degrees of freedom is 12.184668 with a significance level 0.73117262. 
Hence, it is not possible to reject the null of eight lags. 


D. Repeat the procedure in part C in order to show that it is possible to further 
restrict the system to four lags of each variable. Now estimate models with 
eight and four lags over the sample period 1962:Q2 to 1991:Q4. (Note that 
the number of regressors in the unrestricted model is now 12.) You should 

` find 


log( | Es |) =-20.4279 le 
log( |Z, |) =-20.30502 
X76 = 12.165234 with significance level 0.73252907 


-E. Show that it is inappropriate to restrict the system such that there is only 


one lag of each variable. Estimating the two models over the 1961:Q2 to 
1991:Q4 period, you should find 


log(| £, |) =-20.32279 
log( |Z, |) =-19.89689 
Xi = 47.274603 with significance level 0.00000418 


ec 


£ 
oe 
a 


ESAR 


348 


8. 


19. 


Multiequation Time-Series Models 


Question 7 suggested using a bivariate VAR with four lags. Explain how it is ; 
possible to modify the procedure to in order to test for the presence of the sea- 


sonal dummy variables. Show that you can reject the restriction: 
Aio(1)D; = Ajo(2)D2 = A 10(3)D; = A20(1)D, = Az9(2)D2 = A30(3)D; =0 
i. How does this procedure differ from the following test? 


A(1)D; = A29(2)D2 = Ax (3)D3 = 0 


. Keep the seasonal dummies in both equations and estimate the bivariate VAR | 


with four lags over the 1961:Q2 to 1991:Q4 period. 
A. How would you test to determine whether INF Granger causes GM1? 


B. Perform each of the indicated causality tests. 


i. Verify that money growth Granger causes itself. The F-test for the re- 
striction that all the coefficients of A,,(L) = 0 yields a value of 3.3602 
with a significance level of 0.0122948. 


a di. Verify that inflation Granger causes money growth. The F-test for the 
restriction all A,(L) = 0 yields a value of 2.1472 with a significance 
level of 0.0796779. 


` tii, Verify that the F-test for the restriction all coefficients of A,,(L) = 0 
yields a value of 0.7670 with a significance level of 0.5489179. 


iv. Verify that the F-test for the restriction all coefficients of A,.(L) = 0 
yields a value of 56.1908 with a significance level of 0.0000000. 


C. The Granger causality test indicates that inflation Granger causes money 
growth and Granger causes itself. Money growth, however, only Granger 
causes itself. Explain why it is not appropriate to conclude that money 
growth has no affect on inflation! What if you knew that the correlation co- 
efficient between innovations in money growth (i.e., e,,) and inflation (i.e., 
€x) was identically equal to zero? Why might these results change in the 
presence of a third variable (such as GDP85)? 


Consider a Choleski decomposition such that innovations in inflation (denoted 
by €) do not have a contemporaneous effect on money growth, but money 
growth innovations (denoted by €,,) have a contemporaneous effect on infla- 
tion. Represent the relationship between the regression equation errors and 
pure money growth and inflation innovations in terms of (5.39) and (5.40). 


A. If you are using a software package capable of calculating variance decom- 
positions, verify: 


Questions and Exercises 349 


Percent of forecast error variance 
due to money shock 


Steps ahead GM1 INF 
l 100.00 0.1794 
4 94.58 0.4632 
12 sash 93.24 sars 2.0339 pe 


24 92.85 p 2.3442 


Interpret the figures in the table. 


B. Reverse the ordering of the Choleski decomposition, so that money growth 
innovations do not have a contemporaneous effect on inflation. Represent 
the relationship between the regression equation errors and pure money 
growth and inflation innovations in terms of (5.39) and (5.40). 


C. Verify: 


Percent of forecast error variance 
due to money shock 


Steps ahead GM1 INF 
| 99.82 0.0000 
4 94.22 1.7180 
12 93.15 2.3341 
24 92.78 3.1891 


Explain why this alternative ordering is nearly the same as that found in 
part A. What is the correlation coefficient between the regression error terms? 


D. What are the major weaknesses of this bivariate VAR study? Comment on 
the following issues: 
i. The treatment of seasonality. 


ii. Other variables that may affect the relationship between money growth 
and inflation. You may want to expand the VAR by including other 
variables in the file US.WK1. 


ili. Changes in the conduct of monetary policy. 


. In the next set of questions, you are asked to analyze the relationship between 


short- and long-term interest rates. The data file US.WK1 contains some of the 
relevant variables for the period 1960:Q1 through 1991:Q4. Respectively, 


== columns 2, 3, and 4 hold the Treasury bill rate (denoted by TBILL), 3-year 


government bond yield (denoted by R3), and 10-year government bond yield 


E L SAA RGR VR RR ROE Sr tae Se RR Ee 


(denoted by R10). Column 6 contains the U.S. GDP Deflator (denoted by 


GDPDEF, where 1985=100) and column 7 nominal government purchases (de- 
noted by GOVT). 


A. Certain economic theories suggest a relationship between real interest rates 
and real government spending. It seems sensible to analyze a trivariate 
VAR using TBILL, R10, and a measure of real government purchases of 
goods and services. Toward this end, construct the variable RGOVT as the 
ratio GOVT/GDPDEF. You should find 


Observa- Standard 
Series tions Mean Error Minimum Maximum 
RGOVT 128 6255.9 1438.69 3511.256 8868.6 
TBILL 128 6.3959 2.79151059 2.32000000 15.0900 
R10 128 7.6299 2.76273472 3.79000000 14.8500 


B. Pretest the variables for the presence of unit roots using Dickey—Fuller 
tests. Using four lags and a constant, you should find the t-statistics on the 
lagged level of each variable to be 


RGOVT = -0.97872 
TBILL = -2.21122 
R10 = -1.90275 


C. Estimate the trivariate VAR in levels including three seasonal dummy vari- 
ables (see part B of Question 7 concerning the creation of the dummy vari- 
ables). Construct a likelihood ratio test to determine whether it is possible 
to restrict the number of lags from 12 to eight. You should find: 


log(|Z,.|) =3.867667, log |=, 1) = 4.700780 
X? (36 degrees of freedom) = 63.316597 with significance level 0.00327933 


Hence, reject the hypothesis that eight lags are sufficient to capture the dynamic 
relationships in the data. (Note: For this test to be meaningful, the residuals of the 
regression equations used to construct £, should be Stationary.) 


D. Using the model with 12 lags: 


i. Find the correlations between the innovations. Since the Correlation be- 
tween the innovations in TBILL and R10 is 0.808, explain why the or- 
dering in a Choleski decomposition is likely to be important. 


ii. Show that each variable Granger causes the other variables at conven- 
tional significance levels. 


E. Consider the variance decompositions using a Choleski decomposition such 
that RGOVT innovations contemporaneously affect themselves variables, 


TBILL innovations contemporaneously affect themselves and R10, and R10 
innovations contemporaneously affect only R10. Write down this structure 
in terms of a general form of (5.39) and (5.40). Using this ordering, verify 
that the proportions of 24-step ahead forecast error variance of RGOVT, 
TBILL, and R10 due to RGOVT, TBILL, and R10 innovations are 


RGOVT = 89.07528, 9.21137, and 1.71335%, respectively 
TBILL = 13.77804, 84.67659, and 1.54537%, respectively 
R10 = 17.37698, 78.13322, and 4.48980%, respectively 


Thus, TBILL innovations “explain” 78.13322% of the forecast error vari- 
ance in R10, and R10 innovations explain only 1.54537% of the forecast error 
variance in TBILL. 


F. Use the reverse ordering such that R10 innovations affect all variables con- 
temporaneously, TBILL innovations contemporaneously affect TBILL and 
RGOVT, and RGOVT innovations contemporaneously affect only 
RGOVT. Compare your results to those in part E. 


12. The results from Question 11B suggest that all variables are nonstationary. 
Now estimate the same trivariate VAR (including seasonals), but use first dif- 
ferences instead of levels. 


A. Verify the following: 
i. The lag-length tests for eight versus 12 lags yields 


log(| E|) = 4.108633, log( | £|) = 4.700780 | 
x? (36 degrees of freedom) = 58.544793 with significance level 0.01017107 


ii. If we use 12 lags, the correlation between TBILL and R10 innovations 
is 0.7776. 

iii. The change in RGOVT, (i.e., ARGOVT,) does not Granger cause itself 
or AR10, but does Grange cause ATBILL, at the 0.016 significance 
level. 


` B. Use the same ordering as in Question 11E. Verify that the proportions of 
24-step ahead forecast error variance of ARGOVT,, ATBILL,, and AR1O, 
` due to ARGOVT,, ATBILL,, and AR10, innovations are 


ARGOVT = 71.54324, 18.22792, and 10.22885%, respectively 
ATBILL = 19.02489, 70.99188, and 9.98323%, respectively 
AR10 = 15.79140, 50.05796, and 34.15065%, respectively 


C. Perform a block exogenity test to determine whether RGOVT helps to “ex- 
plain” the movements in interest rates. 


352 


Multiequation Time-Series Models 


D. Overall, compare the results of using the variables in levels to those using 
the variables in first differences. a 


ENDNOTES 


12. 


. In terms of the notation of the previous chapter, z, is equivalent to the level dummy vari- 


able D,. 


. In other words, if cy # 0, predicting y,,, necessitates predicting the value of z,,,. 
. In the identification process, we are primarily interested in the shape, not the height, of 


the cross-correlation function. It is useful to standardize the covariance by dividing 
through by 0%; the shape of the correlogram is proportional to this standardized covari- 
ance. Hence, if o = 1, the two are equivalent. The benefit of this procedure is that we 
can obtain the CACF from the transfer function. 


. In such circumstances, Box and Jenkins (1976) recommend differencing y, and/or z,, so 


that the resulting series are both stationary. The modern view cautions against this ap- 
proach; as shown in the next chapter, a linear combination of nonstationary variables 
may be stationary. In such circumstances, the Box—Jenkins recommendation leads to 
overdifferencing. For the time being, it is assumed that both {y,} and {z,} are stationary 
processes. 


- We were able to obtain quarterly data from 1970:I to 1988:IV for Austria, Canada, 


Denmark, Finland, France, West Germany, Greece, Italy, the Netherlands, Norway, the 
U. K. and the United States. The International Monetary Fund’s Balance of Payments 
Statistics reports all data in special drawing rights (SDR). Our dependent variable is the 
logarithm of nation’s revenues divided by the sum of the revenues for all 12 countries. 


. Tourism is highly seasonal; we tried several alternative deseasonalization techniques. 


The results reported here were obtained using seasonal dummy variables. Hence, Y, rep- 
resents the deseasonalized logarithmic share of tourism receipts. The published paper re- 
ports results using quarterly differencing. When either type of deseasonalization was 
used, the final results were similar. 


. Expectations of the future can also be included in this framework. If the temperature 


{y,} is an autoregressive process, the expected value of next period’s temperature (i.e., 
Yı) Will depend on the current and past values. In (5.20), the presence of the terms y, 
and y, can represent how predictions regarding next period’s temperature affect the 
current thermostat setting. 


. It is easily verified that this representation implies that p,, = 0.8. By definition, the cor- 


relation coefficient p,, is defined to be Gj 2/(G,02) and the covariance is £e,,€,= 6,5. If 
we use the numbers in the example, Ee,,e>, = Ele,(€,, + 0.8¢,)] = 0.862. Since the de- 
composition equates var(e,,) with o2, it follows that p,, = 0.8 if 0? = O>. 


. Other types of identification restrictions are included in Sections 10 through 13. 
. In the example under consideration, the symmetry restriction on the coefficients means 


that var(e,,) is equal to var(e,,). This result does not generalize; it holds in the example 
because of the assumed equality var(e,,) = var(e,,). 


. The value | - [=| is asymptotically distributed as a x? distribution with R degrees 


of freedom. 

Since a key assumption of the technique is that E(€,,€,,) = 0, you might wonder how it is 
possible to assume that aggregate demand and supply shocks are independent. After all, 
if the stabilization authorities follow a feedback tule, aggregate demand will change in 


13. 
14. 


15. 


16. 
17. 


Endnotes 353 


response to aggregate supply shocks. The key to understanding this apparent contradic- 
tion is that e, is intended to be the orthogonalized portion of the demand shock, that is, 
the portion of the demand shock that does not change in response to aggregate supply. 
= DL? +. 

For example, A, ,(L) =a, {0) +a (L + ay\( l l 
The VAR residuals also have a constant variance/covariance matrix. Hence, the time 
subscripts can be dropped. 7 
Since two of the restrictions contain squared terms, there will be a positive value and an 
equal but opposite negative value for some of the coefficients. The set of coefficients to 
use is simply a matter of interpretation. In Blanchard and Quah’s example, if €11(0) is 
positive, positive demand shocks have a positive effect of output, and if c,,(O) is nega- 
tive, the positive shock has a negative effect on output. : 

vi it wi ,=QOfort-i<1. 
In doing so, it will be necessary to treat all €,,; l l l 
Here, Canada is treated as the home country, so that e, is the Canadian dollar price of 
U.S. dollars and p* refers to the U.S. price level. 


Chapter 6 


COINTEGRATION AND ERROR- 
CORRECTION MODELS _ 


This chapter explores an exciting new development in econometrics: the estima- 
tion of a structural equation or VAR containing nonstationary variables. In univari- 
ate models, we have seen that a stochastic trend can be removed by differencing. 
The resulting stationary series can be estimated using univariate Box—Jenkins tech- 
niques. At one time, the conventional wisdom was to generalize this idea and dif- 
ference all nonstationary variables used in a regression analysis. However, it is now 
recognized that the appropriate way to treat nonstationary variables is not so 
straightforward in a multivariate context. It is quite possible for there to be a linear 
combination of integrated variables that is stationary; such variables are said to be 
cointegrated. Many economic models entail such cointegrating relationships. The 
aims of this chapter are to: 


1. Introduce the basic concept of cointegration and show that it applies in a variety 
of economic models. Any equilibrium relationship among a set of nonstationary 
variables implies that their stochastic trends must be linked. After all, the equi- 
librium relationship means that the variables cannot move independently of each 
other. This linkage among the stochastic trends necessitates that the variables be 
cointegrated. 


` 2. Consider the dynamic paths of cointegrated variables. Since the trends of cointe- 


grated variables are linked, the dynamic paths of such variables must bear some 
relation to the current deviation from the equilibrium relationship. This connec- 
tion between the change in a variable and the deviation from equilibrium is ex- 
amined in detail. It is shown that the dynamics of a cointegrated system are such 
that the conventional wisdom was incorrect. After all, if the linear relationship is 
already stationary, differencing the relationship entails a misspecification error. 


a 3. Study the alternative ways to test for cointegration. The econometric methods 


underlying the test procedures stem from the theory of simultaneous difference 
equations. The theory is explained and used to develop the two most popular 


356 Multiequation Time-Series Models 


cointegration tests. The proper way to estimate a system of cointegrated variables is 
examined. Several illustrations of each methodology are provided. Moreover, the 
two methods are compared by applying each to the same data set. 


1. LINEAR COMBINATIONS OF INTEGRATED VARIABLES 


Since money demand studies stimulated much of the cointegration literature, we 
begin by considering a simple model of money demand. Theory suggests that indi- 
viduals want to hold a real quantity of money balances, so that the demand for 
nominal money holdings should be proportional to the price level. Moreover, as 
real income and the associated number of transactions increase, individuals will 
want to hold increased money balances. Finally, since the interest rate is the oppor- 
tunity cost of holding money, money demand should be negatively related to the in- 


terest rate. In logarithms, an econometric specification for such an equation can be 
written as 


m, = Bo + Bip, + Bay, + Bar, + e, 3 7 . (6.1) 


where m, = long-run money demand 


p, = price level 
y, = real income 
r, = interest rate 
e, = Stationary disturbance term 


B; = parameters to be estimated 


and all variables but the interest rate are expressed in logarithms. 

The hypothesis that the money market clears allows the researcher to collect 
time-series data of the money supply (= money demand if the money market al- 
ways clears), the price level, real income (possibly measured using real GNP), and 
an appropriate short-term interest rate. The behavioral assumptions require that B, = 
1, B, > 0, and B, < 0; a researcher conducting such a study would certainly want to 
test these parameter restrictions. Be aware that the properties of the unexplained 
portion of the demand for money (i.e., the {e,} sequence) are an integral part of the 
theory. If the theory is to make any sense at all, any deviation in the demand for 
money must necessarily be temporary in nature. Clearly, if e, has a stochastic trend, 
the errors in the model will be cumulative so that deviations from money market 


| equilibrium will not be eliminated. Hence, a key assumption of the theory is that 
; the {e,} sequence is stationary. 


The problem confronting the researcher is that real GNP, the money supply, 
price level, and interest rate can all be characterized as nonstationary /(1) variables. 
As such, each variable can meander without any tendency to return to a long-run 
level. However, the theory expressed in (6.1) asserts that there exists a linear com- 
bination of these nonstationary variables that is stationary! Solving for the error 


Linear Combinations of integrated Variables 33/ 


term, we can rewrite (6.1) as 


e, =m, — Bo- Bip- Boy, = Bar, (6.2) 


Since {e,} must be stationary, it follows that the linear combination of integrated 
variables given by the right-hand side of (6.2) must also be stationary. Thus, the 
theory necessitates that the time paths of the four nonstationary variables (m,}, 
{p,}, (y,}, and {r,} be linked. This example illustrates the crucial insight that has 
dominated much of the macroeconometric literature in recent years: Equilibrium 
theories involving nonstationary variables require the existence of a combination 
of the variables that is stationary. 

The money demand function is just one example of a stationary combination of 
nonstationary variables. Within any equilibrium framework, the deviations from 
equilibrium must be temporary. Other important economic examples involving sta- 
tionary combinations of nonstationary variables include: 


l. Consumption Function Theory. A simple version of the permanent income 
hypothesis maintains that total consumption (c,) is the sum of permanent con- 
sumption (c?) and transitory consumption (c;). Since permanent consumption is 
proportional to permanent income (y?), we can let B be the constant of propor- 
tionality and write c, = By? + ci Transitory consumption is necessarily a station- 
ary variable, and consumption and permanent income are reasonably character- 
ized as (1) variables. As such, the permanent income hypothesis requires that 
the linear combination of two /(1) variables given by c, — By? be stationary. 


2. Unbiased Forward Market Hypothesis. One form of the efficient market hypoth- 
esis asserts that the forward (or futures) price of an asset should equal the ex- 
pected value of that asset’s spot price in the future. If you recall the discussion 
of Corbae and Ouliaris (1986) in Chapter 4, you will remember that foreign ex- 
change market efficiency requires the one-period forward exchange rate to equal 
the expectation of the spot rate in the next period. If we let f, denote the log of 
the one-period price of forward exchange in f, and s, the log of the spot price of 
foreign exchange in £, the theory asserts that E,s,,, = f, If this relationship fails, 
speculators can expect to make a pure profit on their trades in the foreign ex- 
change market. If the agent’s expectations are rational, the forecast error for the 
spot rate in t+ 1 will have a conditional mean equal to zero, so that sp1 ~ ES.) 
= €,,,, where Ee, = 0. Combining the two equations yields s,,, = f, + €.1- 
Since {s,} and {f,} are /(1) variables, the unbiased forward market hypothe- 
sis necessitates that there be a linear combination of nonstationary spot and for- 
ward exchange rates that is stationary. 


3. Commodity Market Arbitrage and Purchasing-Power Parity. Theories of spatial 
competition suggest that in the short run, prices of similar products in varied 
markets might differ. However, arbiters will prevent the various prices from 
moving too far apart even if the prices are nonstationary. Similarly, the prices of 
Apple computers and PCs have exhibited sustained declines. Economic theory 


330 MULUEGuahon 4uNne-ceres Models 


Suggests that these simultaneous declines are related to each other since prices 
of these differentiated products cannot continually widen. 

Also, as we saw in Chapter 4, purchasing-power parity places restrictions on 
the movements of nonstationary price levels and exchange rates. If e, denotes 
the log of the price of foreign exchange and p, and p*, denote respectively, the 
logs of the domestic and foreign price levels, long-run PPP requires that the lin- 
ear combination e, + p* — p, be stationary. 


All these examples illustrate the concept of cointegration as introduced by 
Engle and Granger (1987). Their formal analysis begins by considering a set of 
economic variables in long-run equilibrium when 


Bix, + Box, tent Brn =0 


If we let B and x, denote the vectors (B4, Ba, . ., B,) and (Xin Xan +++ Xp)’, the 


system is in long-run equilibrium when Bx, = 0. The deviation from long-run equi- 
librium—called the equilibrium error—is e, so that 


e= Bx, 


If the equilibrium is meaningful, it must be the case that the equilibrium error 


process is stationary. Engle and Granger (1987) provide the following definition of 
cointegration. 


The components of the vector x, = (Xin Xan.. 


<> Xu) are said to be cointegrated 
of order d, b, denoted by x, ~ Cl(d, b) if 


1. All components of x, are integrated of order d. 


2. There exists a vector B = (B,, B,,..., Bh) such that linear combination Bx, = 
Bixi + Box, + = + B,x,, is integrated of order (d — b), where b > 0. 


The vector B is called the cointegrating vector.' 

In terms of equation (6.1), if the money supply, price level, real income, and in- 
terest rate are all /(1) and the linear combination m, — By — Bip, — Bay, - Br, =e, Is 
stationary, then the variables are cointegrated of order (1, 1). The vector x, is (m,, 1, 
Pr Y» ri) and the cointegrating vector B is (1, -Bo, -Bi -B., -B;). The deviation 
from long-run money market equilibrium is e,; since {e,} is stationary, this devia- 
tion is temporary in nature. 

There are four very important points to note about the definition: 


1. Cointegration refers to a linear combination of nonstationary variables. 
Theoretically, it is quite possible that nonlinear long-run relationships exist 
among a set of integrated variables. However, the current state of econometric 
practice is not able to test for nonlinear cointegrating relationships. Also note 
that the cointegrating vector is not unique. If (B,, B.,..., Bn) is a cointegrating 
vector, then for any nonzero value of A, (AB, ABa .., AB,,) is also a cointegrat- 
ing vector. Typically, one of the variables is used to normalize the cointegrating 
vector by fixing its coefficient at unity. To normalize the cointegrating vector 


with recnect tnv  cimnlu calant ù — 1/R 


desiua res terres 


2. All variables must be integrated of the same order.” Of course, this does not im- 
ply that all similarly integrated variables are cointegrated, usually, a set of Kd) 
” variables is not cointegrated. Such a lack of cointegration implies no long-run 
equilibrium among the variables, so that they can wander arbitrarily far from 
: each other. If the variables are integrated of different orders, they cannot be coin- 
tegrated. Suppose x,, is [(d;) and x, is I(d,) where d, > dy. Question 6 at the end 
- of this chapter asks you to prove that any linear combination of x), and Xo, 18 Kd). 
In a sense, the use of the term “equilibrium” is unfortunate since economic 
theorists and econometricians use the term in different ways. Economic theorists 
usually employ the term to refer to an equality between desired and actual trans- 
actions. The econometric use of the term makes reference to any long-run rela- 
tionship among nonstationary variables. Cointegration does not require that the 
long-run (i.e., equilibrium) relationship be generated by market forces or the be- 
havioral rules of individuals. In Engle and Granger’s use of the term, the equi- 
librium relationship may be causal, behavioral, or simply a reduced-form rela- 
tionship among similarly trending variables. 


3. If x, has n components, there may be as many as n — | linearly independent coin- 
tegrating vectors. Clearly, if x, contains only two variables, there can be at most 
one independent cointegrating vector. The number of cointegrating vectors 1s 
called the cointegrating rank of x,. For example, suppose that the monetary au- 
thorities followed a feedback rule such that they decreased the money supply 
when nominal GNP was high and increased the nominal money supply when 
nominal GNP was low. This feedback rule might be represented by 


m,= Yo ~ NO, +p) +e 
= Yo 7 We 7 ViP: + err E (6.3) 


where {e,,} = a stationary error in the money supply feedback rule 


Given the money demand function in (6.1), there are two cointegrating vec- 
tors for the money supply, price ievel, real income, and interest rate. Let B be the 


(5 x 2) matrix: 
p= 1 —Bo -ß; ~B, T 
“Ul -¥o n no 0 


The two linear combinations given by Bx, are stationary. As such, the cointe- 
grating rank of x, is 2. As a practical matter, if multiple cointegrating vectors are 
found, it may not be possible to identify the behavioral relationships from what 
may be reduced-form relationships. 


4. Most of the cointegration literature focuses on the case in which each variable 
contains a single unit root. The reason is that traditional regression or time-series 
analysis applies when variables are 1(Q) and few economic variables are inte- 


grated of an order higher than unity.? When it is unambiguous, many authors use 
: te natn an nafna ta tha anca in which vanahles are C1. D. The 


360 Multiequation Time-Series Models 


remainder of the text follows this convention. Of course, many other possibilities 
arise. For example, a set of (2) variables may be cointegrated of order Ci(2, 1), so 
that there exists a linear combination that is /(1). 


Worksheet 6.1 illustrates some of the important properties of cointegration rela- 
tionships. In Case 1, both the {y,} and {z,} sequences were constructed so as to be 
random walk plus noise processes. Although the 20 realizations shown generally de- 
cline, extending the sample would eliminate this tendency. In any event, neither se- 
ries shows any tendency to return to a long-run level, and formal Dickey—Fuller tests 
are not able to reject the null hypothesis of a unit root in either series. Although each 
series is nonstationary, you can see that they do move together. In fact, the differ- 
ence between the series (y, — z,}—shown in the second graph—is stationary; the 
“equilibrium error” term e, = (y, — z,) has a zero mean and constant variance. 


CASE1 The {y,} and {z,} sequences are both random walk plus noise processes. 
Although each is nonstationary, the two sequences have the same stochas- 
tic trend; hence, they are cointegrated such that the linear combination 
(), — z) is stationary. The equilibrium error term is an /(0) process. 


Ye = Uy + Eyt l= Mart Exe The equilibrium error: »,-z, 


5 10 15 20 


CASE 2 All three sequences are random walk plus noise processes. As constructed, 
no two are cointegrated. However, the linear combination (y, + z, — w,) is 
stationary; hence, the three are cointegrated. The equilibrium error is an 
1(0) process. 


Yt = Hye + Eye, Ze = Mert Eu, We = Hwet Ew The equilibrium error: ¥,+z,-w, 
1 


t 


erer] 


Linear Combinations of integrated Variables 361 


Case 2 illustrates cointegration among three random walk plus noise processes. 
As in Case 1, no series exhibits a tendency to return to a long-run level, and formal 
Dickey—Fuller tests are not able to reject the null hypothesis of a unit root in any of 
the three. In contrast to the previous case, no two of the series appear to be cointe- 
grated; each series seems to “meander” away from the other two. However, as 
shown in the second graph, there exists a stationary linear combination of the three: 
e, = y, +z, — w, Thus, it follows that the dynamic behavior of at least one variable 
must be restricted by the values of the other variables in the system. 

Figure 6.1 displays the information of Case | in a scatter plot of {y,} against the 
associated value of {z,}; each of the 20 points represents the ordered pairs (y4, Z,), 
(Yo, Z2), - +++ O29; 229). Comparing Worksheet 6.1 and Figure 6.1, you can see that 
low values in the {y,} sequence are associated with low values in the {z,} sequence 


_ and values near zero in one series are associated with values near zero in the other. 


Since both series move together over time, there is a positive relationship between 
the two. The least-squares line in the scatter plot reveals this strong positive associ- 
ation. In fact, this line is the “long-run” equilibrium relationship between the series, 
and the deviations from the line are the stationary deviations from long-run equilib- 
rium. 

For comparison purposes, graph (a) in Worksheet 6.2 shows 100 realizations of 


: two random walk plus noise processes that are not cointegrated. Each seems to me- 


Figure 6.1 Scatter plot of cointegrated variables. 


Values of z 


é | E E A EA 
a =2 -1 0 


Values of y 


The scatter plot was drawn using the {y} and {z} sequences from 
Case 1 of Worksheet 6.1. Since both series decline over time, 
there appears to be a positive relationship between the two. 
The equilibrium regression line is shown. 


ander without any tendency to approach the other. The scatter plot shown in graph 
(b) confirms the impression of no long-run relationship between the variables. The 
deviations from the straight line showing the regression of z, on y, are substantial, 
Plotting the regression residuals against time [see graph (c)] suggests that the re- 
gression residuals are not stationary. l 


The {y,} and {z,} sequences are constructed to independent random walk plus noise 
processes. There is no cointegrating relationship between the two variables. As 
shown in (a), both seem to meander without any tendency to come together. Graph 
(b) shows the scatter plot of the two sequences and the regression line z, = Bo + Biye 
However, this regression line is spurious. As shown in graph (c), the regression 
residuals are nonstationary. 


Vi = Hyi + Eyt u= Hut Ey Regression of z, 0n y 


Regression residuals 


(0) 20 40 60 80 100 
{c) 


: 2. COINTEGRATION AND COMMON TRENDS 


Stock and Watson’s (1988) observation that cointegrated variables share common 

stochastic trends provides a very useful way to understand cointegration relation- 

ships.* For ease of exposition, return to the case in which the vector x, contains only 

two variables, so that x, = (y, z,)’. Ignoring cyclical and seasonal terms, we can de- 
compose each variable into a random walk plus an irregular (but not necessarily 
: white-noise) component.’ Hence, we can write 


Yr = Hyr + Ey | ‘ m ae i (6.4) 


Z= Her + Ey es EMESIS : (6.5) 
_ where y, = is a random walk process representing the trend in variable i in period 1 
€, = the stationary (irregular) component of variable i in period z. 


-If {y,} and {z,} are cointegrated of order (1, 1), there must be nonzero values of 
| B, and B, for which the linear combination By, + Baz, is stationary, that is, 


Buy: T Baz, = Bi (Hy. + Ey) + Balha + Ez) 


= (Bib, + Bata) + (Bi€,, + Bo€.,) 


For Biy, + Boz, to be stationary, the term (Bipy + BaMa) must vanish. After all, if 
either of the two trends appears in (6.6), the linear combination B,y, + Baz, will also 
_ have a trend. Since the second term in parenthesis is stationary, the necessary and 
` sufficient condition for {y,} and {z,} to be C/(1, 1) is 


Bibby + Bott, = 0 (6.7) 


Clearly, j1,, and p, are variables whose realized values will be continually chang- 
ing over time. Since we preclude both B, and B, from being equal to zero, it follows 
that (6.7) holds for all ¢ if and only if 


Hyr = —Bott,,/B, 


For nonzero values of B, and B,, the only way to ensure equality is for the sto- 
chastic trends to be identical up to a scalar. Thus, up to the scalar -B,/B,, two I(/) 
stochastic processes {y,} and {z,} must have the same stochastic trend if they are 
cointegrated of order (I, 1). 

Return your attention to Worksheet 6.1. In Case 1, the {y,} and {z,} sequences 
were constructed so as to satisfy 


yee H if Sy 
Zi = H: + Ez 


364 Multiequation Time-Series Models 


~and 


H = by + €, 


where €n €,,, and €, = independently distributed white-noise disturbances. 


By construction, u, is a pure random walk process representing the same stochas- 
tic trend for both the {y,} and {z,} sequences. The value of pọ was initialized to 
zero and three sets of 20 random numbers were drawn to represent the {e,,}, {€x}, 
and {€,} sequences. Using these realizations and the initial value of Up, we con- 
structed the {y,}, {z,}, and {p1,} sequences. As you can clearly determine, subtract- 
ing the realized value of z, from y, results in a stationary sequence: 


2, = (M, + 6) — (H, + €) = Ey 7 


To state the point using Engle and Granger’s terminology, premultiplying the 
vector x, = (y,, z) by the cointegrating vector B = (1, —1) yields the stationary se- 
quence €, = €,, — €,,. Indeed, the equilibrium error term shown in the second graph 
of Worksheet 6.1 has all the hallmarks of a stationary process. The essential insight 
of Stock and Watson (1988) is that the parameters of the cointegrating vector must 
be such that they purge the trend from the linear combination. Any other linear 
combination of the two variables contains a trend, so that the cointegrating vector is 
unique up to a normalizing scalar. For example, Bay, + Baz, is not stationary unless 
Ba/Ba = B/B 

Recall that Case 2 illustrates cointegration between three random walk plus noise 
processes. As in Case 1, each process is /(1), and Dickey—Fuller unit root tests 
would not be able to reject the null hypothesis that each contains a unit root. As you 
can see in the lower portion of Worksheet 6.1, no pairwise combination of the se- 
Ties appears to be cointegrated. Each series seems to meander but, as opposed to 
Case 1, no one single series appears to remain close to any other series. However, 
by construction, the trend in w, is the simple summation of the trends in y, and z,: 


Hw = Hy: +H 


Here, the vector x, = (y,, Z, w,) has the cointegrating vector (1, 1, —1), so that the 
linear combination y, + z, — w, is stationary. Consider: 


Yi + Zp — Wy = (Hye + Eye) + (Mar + Eze) — (Hiwi + Ewe) i i i ; 
= Ey H Eg Ewe ee 

The example illustrates the general point that cointegration will occur whenever 
the trend in one variable can be expressed as a linear combination of the trends in 
the other variable(s). In such circumstances, it is always possible to find a vector B 
such that the linear combination B,y, + Baz, + Baw, does not contain a trend. The re- 


Cointegration and Error Correction 365 


sult easily generalizes to the case of n variables. Consider the vector representation: 


X,=U, +E, Ande a ind (6.8) 
where x, = the vector (Xin Xan oo, Xp) 
H, = the vector of stochastic trends (Min on. . +s Uu) 
€ = ann X | vector of irregular components 


If one trend can be expressed as a linear combination of the ober trends in the 


= system, it means that there exists a vector B such that 


ee ae ree 


Biba, ns Bobo, Tae Batts; =0 


Premultiply (6.8) by this set of B,’s to obtain 


Bx, T Bu, + Be, 


Since Bu, = 0, it follows that Bx, = Be,. Hence, the linear combination Bx, is sta- 
tionary. The argument easily generalizes to the case in which there are multiple lin- 
ear relationships among the trends. If the cointegrating rank is r, there are r < n lin- 
ear relationships among the trends, so that we can write 


But, = 0 
where B= arx n matrix consisting of elements B; 


For example, if there are two cointegrating vectors among n variables, there are 
two independent cointegrating vectors of the form: 


ie Bis = Bu 
Bay Baz _ Bo, 

Notice that it is possible to subtract 8,/B,; times row 2 from row 1 to yield an- 
other linear combination of the x, that is stationary. However, there will be only 


n — 1 nonzero coefficients of the x; in this combination. More generally, if there are 


r cointegrating vectors among n variables, there exists a cointegrating vector for 
each subset of (n — r) variables. 


3. COINTEGRATION AND ERROR CORRECTION 


A principal feature of cointegrated variables is that their time paths are influenced 
by the extent of any deviation from long-run equilibrium. After all, if the system is 
to return to the long-run equilibrium, the movements of at least some of the vari- 


ables must respond to the magnitude of the disequilibrium. For example, theories of 
` the term structure of interest rates imply a long-run relationship between long- and 
short-term rates, If the gap between the long- and short-term rates is “large” relative 
to the long-run relationship, the short-term rate must ultimately rise relative to the 
long-term rate. Of course, the gap can be closed by (1) an increase in the short-term 
rate and/or a decrease in the long-term rate, (2) an increase in the long-term rate but 
a commensurately larger rise in the short-term rate, or (3) a fall in the long-term 
rate but a smaller fall in the short-term rate. Without a full dynamic specification of 
the model, it is not possible to determine which of the possibilities will occur. 
Nevertheless, the short-run dynamics must be influenced by the deviation from the 
long-run relationship. 

The dynamic model implied by this discussion is one of error correction. In an 
error-correction model, the short-term dynamics of the variables in the system are 
influenced by the deviation from equilibrium. If we assume that both interest rates 
are /(1), a simple error-correction model that could apply to the term structure of 
interest rates isf 


Ars, = Asr — Prs) + Esn Os>Q =i aas (69) 
Aru =A Prsi) +E O> 0 i < (6.10) 


where r, and rs, are the long- and short-term interest rates, respectively. 


The two terms represented by €s, and €,, are white-noise disturbance terms that may 
be correlated and Qg, ,, and B are positive parameters. 

As specified, the short- and long-term interest rates change in response to sto- 
chastic shocks (represented by €,, and €,,) and to the previous period’s deviation 
from long-run equilibrium. Everything else equal, if this deviation happened to be 
positive (so that r,,_, — Brs,_; > 0), the short-term interest rate would rise and the 
long-term rate would fall. Long-run equilibrium is attained when r,, = Brg. 

Here you can see the relationship between error-correcting models and cointe- 
grated variables. By assumption, Arg, is stationary, so that the left-hand side of (6.9) 
is 1(0). For (6.9) to be sensible, the right-hand side must be /(0) as well. Given that 
€s is stationary, it follows that the linear combination r,,_., — Brs,_, must also be sta- 
tionary; hence, the two interest rates must be cointegrated with the cointegrating 
vector (1, —B). Of course, the identical argument- applies to (6.10). The essential 
point to note is that the érror-correction representation necessitates the two vari- 
ables be cointegrated of order CKL, 1). This result is unaltered if we formulate a 


more general model by introducing the lagged changes of each rate into both equa- 
tions:’ 


Ars, = Gio + QST- — B's) + Lay Ars; + Ea iA + Es (6.11) 
Ary, = Ay — OL ti — Brsy1) + Lag (Ars; + Lay (Ary; + E (6.12) 


Again, Esn €,,, and all terms involving Ars,; and Ar,,_, are stationary. Fhus, the 
linear combination of interest rates (r,,_; — Brs,_;)-must also be stationary. 


Inspection of (6.11) and (6.12) reveals a striking similarity to the VAR models of 
the previous chapter. This two-variable error-correction model is a bivariate VAR 
in first differences augmented by the error-correction terms Os(r,,.) ~ Prs.) and 
-0 (ru ~ Prs). Notice that Œs and a, have the interpretation of speed of adjust- 
ment parameters. The larger Os is, the greater the response of rs, to the previous pe- 
riod’s deviation from long-run equilibrium. At the opposite extreme, very small 
values of os imply that the short-term interest rate is unresponsive to last period’s 
equilibrium error. For the {Arg} sequence to be unaffected by the long-term inter- 
est rate sequence, Qs and all the a,,(i) coefficients must be equal to zero. Thus, the 
absence of Granger causality for cointegrated variables requires the additional 
condition that the speed of adjustment coefficient be equal to zero. Of course, at 
least one of the speed of adjustment terms in (6.11) and (6.12) must be nonzero. If 
both O, and a, are equal to zero, the long-run equilibrium relationship does not ap- 
pear and the model is not one of error correction or cointegration. 

The result is easily generalized to the n-variable model. Formally, the (n x 1) 
vector xX, = (Xin Xo, - - -> Xp) has an error-correction representation if it can be ex- 
pressed in the form: 


AX, = Mo + MX, + MAG) + MAX. + + MAX, + & ; (6.13) 


where To= an (n x 1) vector of intercept terms with elements To 
n; = (n x n) coefficient matrices with elements 7,,(2) 
z = is a matrix with elements m,, such that one or more of the m, # 0 
e, = an (nx 1) vector with elements €; 


Note that the disturbance terms are such that e, may be correlated with €,,. 

Let all variables in x, be /(1). Now, if there is an error-correction representation 
of these variables as in (6.13), there is necessarily a linear combination of the /(1) 
variables that is stationary. Solving (6.13) for mx,_, yields 


TX, 1 = Ax, — Mp — LT ,Ar,_; — €, 


Since each expression on the right-hand side is stationary, mx,_, must also be sta- 
tionary. Since m contains only constants, each row of 7 is a cointegrating vector of x, 
For example, the first row can be written as (Ximi + Miata) to + Xap). Since 
each series Xg is (1), (11, Riz --., Rin) Must be a cointegrating vector for x, 

The key feature in (6.13) is the presence of the matrix x. There are two important 
points to note: 


1. If all elements of n equal zero, (6.13) is a traditional VAR in first differences. In 
such circumstances, there is no error-correction representation since Ax, does not 
respond to the previous period’s deviation from long-run equilibrium. 


2. If one or more of the 1, differs from zero, Ax, responds to the previous period's 
deviation from long-run equilibrium. Hence, estimating x, as a VAR in first dif- 


368 Multiequation Time-Series Models 


ferences is inappropriate if x, has an error-correction representation. The omis- 
sion of the expression nx, entails a misspecification error if x, has an error-cor- 
rection representation as in (6.13). 


A good way to examine the relationship between cointegration and error correc- 
tion is to study the properties of the simple VAR model: 


Yı = QV + Aiei tE 2 PUUR GN] (6.14) 
Zi = Ag Vp-y + Aei t Eg o ee, saaa (6.15) 


where ¢,, and €,, are white-noise disturbances that may be correlated with each 
other and, for simplicity, intercept terms have been ignored. Using lag op- 


erators, we can write (6.14) and (6.15) as 


(A ne a,,L)y, z alz, = Eyr 
-a,,Ly, + (l a Ay L)2, = €2 


The next step is to solve for y, and z, Writing the system in matrix form, we ob- 


tain 
a (i —ay,L) ayo L | L — il 
-aL ad — AL) Ze Ez 


Using Cramer’s rule or matrix inversion, we can obtain the solutions for y, and z, 


as 
_ =a Ley, tanlen 
TT Ce eS 
ay, Ley, +0 —a,,Le,, 
of 


(l-a Dd=a,L) aa L (6.17) 


We have converted the two-variable first-order system represented by (6.14) and 
(6.15) into two univariate second-order difference equations of the type examined 
in chapter 1. Note that both variables have the same inverse characteristic equation 
(1 — ay, L)(1 — anL) — @,2a,L’. Setting (1 - a,,L)(1 — anL) - a242}? = 0 and 
solving for L yield the two roots of the inverse characteristic equation. In order to 
work with the characteristic roots (as opposed to the inverse characteristic roots), 
define = 1/L and write the characteristic equation as 


A? — (yy + Go9)A + (441422 — 242421) = 0 (6.18) 


Since the two variables have the same characteristic equation, the characteristic 
roots of (6.18) determine the time paths of both variables. The following remarks 


Cointegration and Error Correction 369 


summarize the time paths of {y,} and {z,}: 


1. If both characteristic roots (A,, à) lie inside the unit circle, (6.16) and (6.17) 
yield stable solutions for {y,} and {z,}. If £ is sufficiently large or the initial con- 
ditions are such that the homogenous solution is zero, the stability condition 
guarantees that the variables are stationary. The variables cannot be cointegrated 
of order (1, 1) since each will be stationary. 


2. If either root lies outside the unit circle, the solutions are explosive. Neither vari- 
able is difference stationary, so that they cannot be C/(1, 1). In the same way, if 
both characteristic roots are unity, the second difference of each variable will be 
stationary. Since each is /(2), the variables cannot be C/(1, 1). 


3. As you can see from (6.14) and (6.15), if a), = a2; = 0, the solution is trivial. For 
{y,} and {z,} to be unit root processes, it is necessary for a,, = a, = 1. It follows 
that A, = A, = | and the two variables evolve without any long-run equilibrium 
relationship; hence, the variables cannot be cointegrated. 


4. For {y,} and {z,} to be C/(1, 1), it is necessary for one characteristic root to be 
unity and the other less than unity in absolute value. In this instance, each vari- 
able will have the same stochastic trend and the first difference of each variable 
will be stationary. For example, if A, = 1, (6.16) will have the form: 


y= (CL > anL)E, + aLe VU - L) —A,L)] 
of multiplying by (1 — L), we get 
(1 — Ly, = Ay, = (Cl = anbe,, + aLe ~ AL) 


which is stationary if || <1. 


Thus, to ensure that the variables are C/(1, 1), we must set one of the characteris- 
tic roots equal to unity and the other to a value that is less than unity in absolute 
value. For the larger of the two roots to equal unity, it must be the case that 


0.5 * (ay, + ayy) + 0.5 * V(a?y + a3) — 2a, az + kaan = 1 
so that after some simplification, the coefficients are seen to satisfy® 
a = {C1 — an) = ai a, (1 ~ a22) hit, (6.19) 


Now consider the second characteristic root. Since a,, and/or a, must differ 
from zero if the variables are cointegrated, the condition | àl < | requires 


Ss] (6.20) 


and 


9p, + (an) <1 l (6.21) 


Equations (6.19), (6.20), and (6.21) are restrictions we must place on the coeffi- 
cients of (6.14) and (6.15) if we want to ensure that the variables are cointegrated 
of order (1, 1). To see how these coefficient restrictions bear on the nature of the 
solution, write (6.14) and (6.15) as 


Eor -1 ay [e H 
= + 
Az, | ay anli z Ex 
Now, (6.19) imples that a}; — 1 = —a;242,/(1 — a5), so that after a bit of manipu- 
lation, (6.22) can be written in the form: 


(6.22) 


Ay, = ~[44242,/(1 — ay) ]y,-1 + 4422) + Eyr l (6.23) 
AZ, = Ay1Y,-1 — (1 — aa); + ey (6.24) 


Equations (6.23) and (6.24) comprise an error-correction model. If both a, and 
az, differ from zero, we can normalize the cointegrating vector with respect to ei- 
ther variable. Normalizing with respect to y,, we get 


Ay, = YO- 7 Bz) +6, se 
Az, = O(Y,-1 =i Bza) + Ey . 


Where 4, = —@)7@2,/(1 — az) 
B =(1- aay, 


QO, = a; 


You can see that y, and z, change in response to the previous period’s deviation 
from long-run equilibrium: y,_, — Bz,_. If y1 = Bz- y, and z, change only in re- 
sponse to €, and €, shocks. Moreover, if a, < 0 and a, > 0, y, decreases and z, in- 
creases in response to a positive deviation from long-run equilibrium. 

You can easily convince yourself that conditions (6.20) and (6.21) ensure that B 
# 0 and at least one of the speed of adjustment parameters (i.e., a, and œ) is not 
equal to zero. Now, refer to (6.9) and (6.10); you can see this model is in exactly 
the same form as the interest rate example presented in the beginning of this sec- 
tion. 

Although both a,, and a,, cannot equal zero, an interesting special case arises if 
one of these coefficients is zero. For example, if we set a,, = 0, the speed of adjust- 
ment coefficient &, = 0. In this case, y, changes only in response to Eyr as Ay, = 6.7 
The {z,} sequence does all the correction to eliminate any deviation from long-run 
equilibrium. 


To highlight some of the important implications of this simple model, we have 
shown: 


1. The restrictions necessary to ensure that the variables are CI(1, 1) guarantee 
that an error-correction model exists. In our example, both {y,} and {z,} are unit 
root processes but the linear combination y, — Bz, is stationary; the normalized 
cointegrating vector is [1, -(1 — @2,)/a,,]. The variables have an error-correction 
representation with speed of adjustment coefficients a, = ~a,.@7,/(1 — az) and 
Q, = ,. It was also shown that an error-correction model for /(1) variables nec- 
essarily implies cointegration. This finding illustrates the Granger representa- 
tion theorem stating that for any set of /(1) variables, error correction and coin- 
tegration are equivalent representations. 


2. Cointegration necessitates coefficient restrictions in a VAR model. Let x, = 
On z) and e, = (€,,, €x), SO that we can write (6.22) in the form: 


AX, = TY, tE soe Adee es SB (6,25) 


Clearly, it is inappropriate to estimate a VAR of cointegrated variables using 
only first differences. Estimating (6.25) without the expression Tx, would 
eliminate the error—correction portion of the model. It is also important to note 
that the rows of n are not linearly independent if the variables are cointegrated. 
Multiplying each element in row | by —(1 — @2)/a,z yields the corresponding el- 
ement in row 2. Thus, the determinant of 7 is equal to zero and y, and z, have the 
error-correction representation given by (6.23) and (6.24). 

This two-variable example illustrates the very important insights of Johansen 
(1988) and Stock and Watson (1988) that we can use the rank of r to determine 
whether or not two variables {y,} and {z,} are cointegrated. Compare the deter- 
minant of n to the characteristic equation given by (6.18). If the largest charac- 
teristic root equals unity (A, = 1), it follows that the determinant of x is zero and 
z has a rank equal to unity. If z were to have a rank of zero, it would be neces- 
sary for a; = 1, a), = 1, and a,;, = a,, = 0. The VAR represented by (6.14) and 
(6.15) would be nothing more than Ay, = €,, and Az, = €,,. In this case, both the 
{y,} and {z,} sequences are unit root processes without any cointegrating vector. 
Finally, if the rank of 1m is full, then neither characteristic root can be unity, so 
that the {y,} and {z,} sequences are jointly stationary. 


3.In general, both variables in a cointegrated system will respond to a deviation 
from long-run equilibrium. However, it is possible that one (but not both) of the 
speed of adjustment parameters is zero. In this circumstance, that variable does 
not respond to the discrepancy from long-run equilibrium and the other variable 
does all the adjustment. Hence, it is necessary to reinterprete Granger causality 
in a cointegrated system. In a cointegrated system, {z,} does not Granger cause 
{y,} if lagged values Az,,; do not enter the Ay, equation and if y, does not re- 
spond to the deviation from long-run equilibrium. For example, in the cointe- 


372 Multiequation Time-Series Models 


grated system of (6.11) and (6.12), {r,,} does not Granger cause {rs,} if all 
a(i) = 0 and a, = 0. 


The n-Variable Case 


Little is altered in the n-variable case. The relationship between cointegration, error 
correction, and the rank of the matrix 7 is invariant to adding variables to the sys- 
tem. The interesting feature introduced in the n-variable case is the possibility of 
multiple cointegrating vectors. Now consider a more general version of (6.25): 


x, = Ait + & o (6.26) 
where x, = the (nx 1) vector (Xin Xo, o-s Xap) 
€, = the (n X 1) vector (€n Ean -o a En) 
A, = an (n x n) matrix of parameters 


Subtracting x,_, from each side of (6.26) and letting / be an {n x n) identity ma- 
trix, we get a 


Ay- Apate 
=X, + OO E (6.27) 
where T is the (n x n) matrix ~({ — A;) and m; denotes the element in row i and 
column j of T. As you can see, (6.27) is a special case of (6.13) such that 
all x, =0. 


Again, the crucial issue for cointegration concerns the rank of the (n x n) matrix 
n. If the rank of this matrix is zero, each element of m must equal zero. In this in- 
stance, (6.27) is equivalent to an n-variable VAR in first differences: 


Ax, = €, 


Here, each Ax, = e; so that the first difference of each variable in the vector x, is 
1(0). Since each x; = Xj) + € all the {x,,}] sequences are unit root processes and 
there is no linear combination of the variables that is stationary. 

At the other extreme, suppose that 7 is of full rank. The long-run solution to 
(6.27) is given by the n independent equations: 


Ty Xq, + MyaXo) + My yXz, + + My Xn, = 0 
Mey Xp, + Noar + Raku, e + MagXnr = O 


Mi X1y + MpaXoz + Mpg, E + Makar = O (6.28) 


BEE JYI SL ELE LTE LEE PEE EC EEUU Y viv 


Each of these n equations is an independent restriction on the long-run solution 
of the variables; the n variables in the system face n long-run constraints. In this 
case, each of the n variables contained in the vector x, must be stationary with the 
long-run values given by (6.28). 

In intermediate cases, in which the rank of 1 is equal to r, there are r cointegrat- 
ing vectors. If r = 1, there is a single cointegrating vector given by any row of the 
matrix T. Each {x,,} sequence can be written in error-correction form. For example, 
we can write Ax,, as 


Axi, = My pX yyy FM aka Ht Maka tE 


of, normalizing with respect to x,,_,, we can set @, =7,, and B, = 1,/m,, to obtain 


Ax, = AX = Bintan to + Bintan) + E GE (6.29 


In the long-run, the {x,,} will satisfy the relationship: 


Xi + Bixa toe + Bi Xa =O 


Hence, the normalized cointegrating vector is (1, Bia, B,3,.-., Bin) and the speed 
of adjustment parameter a,. In the same way, with two cointegration vectors the 
long-run values of the variables will satisfy the two relationships: 


Miir E Riar tv ty Xp = 0 
Ta Xp + Roar $+ + Ma Xp, =O 
which can be appropriately normalized. 

The main point here is that there are two important ways to test for cointegration. 
The Engle-Granger methodology seeks to determine whether the residuals of the 
equilibrium relationship are stationary. The Johansen (1988) and Stock—-Watson 
(1988) methodologies determine the rank of n. The Engle-Granger approach is the 
subject of the next three sections. Sections 7 through 10 examine the Johansen 
(1988) and Stock—Watson (1988) methodologies. 


— 


4. TESTING FOR COINTEGRATION: THE 
ENGLE-GRANGER METHODOLOGY 


To explain the Engle~Granger testing procedure, let us begin with the type of prob- 
lem likely to be encountered in applied studies. Suppose that two variables—say, y, 
and z,—are believed to be integrated of order 1 and we want to determine whether 
there exists an equilibrium relationship between the two. Engle and Granger (1987) 


SU ARG AIT RMR Tae oA reer oh eRe Hn ny ates emt te a 


aOR SE tae 


EIR Aa 


SE ea lg ce oe EEE, 


propose a straightforward test whether two /(1) variables are cointegrated of order 
C(L, 1). 


STEP 1: Pretest the variables for their order of integration. By definition, cointegra- 
tion necessitates that the variables be integrated of the same order. Thus, 
the first step in the analysis is to pretest each variable to determine its or- 
der of integration. The Dickey—Fuller, augmented Dickey—Fuller, and/or 
Phillips—Perron tests discussed in Chapter 4 can be used to infer the num- 
ber of unit roots (if any) in each of the variables. If both variables are sta- 
tionary, it is not necessary to proceed since standard time-series methods 
apply to stationary variables. If the variables are integrated of different or- 
ders, it is possible to conclude that they are not cointegrated.'° 


STEP 2: Estimate the long-run equilibrium relationship. If the results of Step 1 in- 
dicate that both {y,} and {z,} are (1), the next step is to estimate the long- 
run equilibrium relationship in the form: 


y= Bo + Biz, +e, (6.30) 


If the variables are cointegrated, an OLS regression yields a “super-con- 
sistent” estimator of the cointegrating parameters Bo and B,. Stock (1987) 
proves that the OLS estimates of By and B, converge faster than in OLS 
models using stationary variables. To explain, reexamine the scatter plot 
shown in Figure 6.1. You can see that the effect of the common trend 
dominates the effect of the stationary component; both variables seem to 
rise and fall in tandem. Hence, there is a strong linear relationship as 
shown by the regression line drawn in the figure. 

In order to determine if the variables are actually cointegrated, denote 
the residual sequence from this equation by {é,}. Thus, {é,} is the series of 
the estimated residuals of the long-run relationship. If these deviations 
from long-run equilibrium are found to be stationary, the {y,} and {z,} se- 
quences are cointegrated of order (1, 1). It would be convenient if we 
could perform a Dickey—Fuller test on these residuals to determine their 
order of integration. Consider the autoregression of the residuals: 


Aé, = a, ĉn + €, (6.31) 


Since the {é,} sequence is a residual from a regression equation, there is 
no need to include an intercept term; the parameter of interest in (6.31) is 
a,. If we cannot reject the null hypothesis a, = 0, we can conclude that the 
residual series contains a unit root. Hence, we conclude that the {y,} and 
{z,} sequences are not cointegrated. The more precise wording is awkward 
because of a triple negative, but to be technically correct, if if is not possi- 
ble to reject the null hypothesis la j | =0, we cannot reject the hypothesis 


STEP 3: 


that the variables are not cointegrated. Instead, the rejection of the null 
hypothesis implies that the residual sequence is stationary.'' Given that 
both {y,} and {z,} were found to be /(1) and the residuals are stationary, 
we can conclude that the series are cointegrated of order (1, 1). 

In most applied studies, it is not possible to use the Dickey—Fuller tables 
themselves. The problem is that the {é,} sequence is generated from a re- 
gression equation; the researcher does not know the actual error é,, only 
the estimate of the error ê, The methodology of fitting the regression in 
(6.30) selects values of By and B, that minimize the sum of squared residu- 
als. Since the residual variance is made as small as possible, the procedure 
is prejudiced toward finding a stationary error process in (6.31). Hence, 
the test statistic used to test the magnitude of a, must reflect this fact. Only 
if By and B, were known in advance and used to construct the true {e,} se- 
quence would an ordinary Dickey—Fuller table be appropriate. Fortunately, 
Engle and Granger provide test statistics that can be used to test the hy- 
pothesis a, = 0. When more than two variables appear in the equilibrium 
relationship, the appropriate tables are provided by Engle and Yoo (1987). 

If the residuals of (6.31) do not appear to be white-noise, an augmented 
Dickey—Fuller test can be used instead of (6.31). Suppose that diagnostic 
checks indicate that the {e,} sequence of (6.31) exhibits serial correlation. 
Instead of using the results from (6.31), estimate the autoregression: 


Aê, = aê, edit l l (6.32) 


Again, if —2 < a, < 0, we can conclude that the residual sequence is sta- 
tionary and {y,} and {z,} are C/(1, 1). 


Estimate the error-correction model. If the variables are cointegrated (i.e., 
if the null hypothesis of no cointegration is rejected), the residuals from 
the equilibrium regression can be used to estimate the error-correction 
model. If {y,} and {z,} are C/(1, 1), the variables have the error-correction 
form: 


Ay, =, +a, Or Biz, )+ ¥ ody, +9 0 (iAz, -i TE (6.33) 


i=l ; i=l 


Az, =O, +0, (1 = Biz ) + Yo (AY, i + p9 Qa (TAZ; tEn (6.34) 


i=l i=} 


376 Multiequation Time-Séries Models 


where B; = the parameter of the cointegrating vector given by 
(6.30) 
€,, and €,, = white-noise disturbances (which may be correlated 
with each other) 


and Qi, O, OL, Qo &1 (7), (4), O14), and O4.(/) are all parameters. 

Engle and Granger (1987) propose a clever way to circumvent the 
cross-equation restrictions involved in the direct estimation of (6.33) and 
(6.34). The value of the residual é,_, estimates the deviation from long-run 
equilibrium in period (t — 1). Hence, it is possible to use the saved residu- 
als {é,.,} obtained in Step 2 as an instrument for the expression y,_, — 
B,z,-, in (6.33) and (6.34). Thus, using the saved residuals from the esti- 
mation of the long-run equilibrium relationship, we can estimate the error- 
correcting model as 


Ay, = 4, +018, + 9) 04, DAY, + Dy Az, | + Ey, -. 638) 


i=1 i=} 


Az, =O, +0,é,_, + 5 Oy, (i)AYp + > Og (i)AZ,.; + Ey (6.36) 


i=l : j=l 


Other than the error-correction term ĉ,_;, é Equations (6.35) and (6.36) 
constitute VAR in first differences. This near VAR can be estimated using 
the same methodology developed in Chapter 5. All the procedures devel- 
oped for a VAR apply to the near VAR. Notably: 


1. OLS is an efficient estimation strategy since each equation contains the 
same set of regressors. 


2. Since all terms in (6.35) and (6.36) are stationary [i.e., Ay, and its lags, 
Az, and its lags, and é,_, are /(0)], the test statistics used in traditional 
VAR analysis are appropriate for (6.35) and (6.36). For example, lag 
lengths can be determined using a x° test and the restriction that all 
(i) = 0 can be checked using an F-test. If there is a single cointegrat- 
ing vector, restrictions concerning Q, or @, can be conducted using a 
t-test. Asymptotic theory indicates a, and a, converge to a t-distribu- 
tion as sample size increases. 


STEP 4: Assess model adequacy. There are several procedures that can help deter- 
mine whether the estimated error-correction model is appropriate. 
KO 
\ 


1. You should be careful to assess the adequacy of the model by perform- 
ing diagnostic checks to determine whether the residuals of the near 


Illustrating the Engle-Granger Methodology 377 


VAR approximate white noise. If the residuals are serially correlated, 
lag lengths may be too short. Reestimate the model using lag lengths 
that yield serially uncorrelated errors. It may be that you need to allow 
longer lags of some variables than on others. 


2. The speed of adjustment coefficients a, and &, are of particular interest 
in that they have important implications for the dynamics of the sys- 
tem.'? If we focus on (6.36), it is clear that for any given value of ê, a 
large value of a, is associated with a large value of Az,. If a, is zero, 
the change in z, does not at all respond to the deviation from long-run 
equilibrium in (t — 1). If a, is zero and all a,,(i) = 0, then it can be said 
that {Ay,} does not Granger cause {Az,}. We know that one or both of 
these coefficients should be significantly different from zero if the vari- 
ables are cointegrated. After all, if both a, and &, are zero, there is no 
error correction and (6.35) and (6.36) comprise nothing more than a 
VAR in first differences. Moreover, the absolute values of these speed 
of adjustment coefficients must not be too large. The point estimates 
should imply that Ay, and Az, converge to the long-run equilibrium rela- 
tionship." 

3. As in a traditional VAR analysis, Lutkepoh] and Reimers (1992) show 
that innovation accounting (i.e., impulse responses and variance de- 

, composition analysis) can be used to obtain information concerning the 

Gupi interactions among the variables. As a practical matter, the two innova- 

tions €„ and €,, may be contemporaneously correlated if y, has a con- 
temporaneous effect on z, and/or z, has a contemporaneous effect on y,. 
In obtaining impulse response functions and variance decompositions, 
some method—such as Choleski decomposition—can be used to or- 
thogonalize the innovations. 


The shape of the impulse response functions and results of the variance 
decompositions can indicate whether the dynamic responses of the vari- 
ables conform to theory. Since all variables in (6.35) and (6.36) are /(0), 
the impulse responses should converge to zero. You should reexamine 
your results from each step if you obtain a nondecaying or explosive im- 
pulse response function. 


5. ILLUSTRATING THE ENGLE-GRANGER 
METHODOLOGY 


Figure 6.2 shows three simulated variables that can be used to illustrate the Engle- 
Granger procedure. Inspection of the figure suggests that each is nonstationary and 
there is no visual evidence that any pair is cointegrated. As detailed in Table 6.1, 
each series is constructed as the sum of a stochastic trend component plus an au- 
toregressive irregular component. 


Figure 6.2 Three cointegrated series. 
Se ee a a aoe ee 


| | 
0 10 20 30 40 50 60 70 80 90 100 


— y +z --w 


The first column of the table contains the formulas used to construct the {y,} se- 
quence. First, 150 realizations of a white-noise process were drawn to represent the 
{€} sequence. Initializing 4o = 0, we constructed 150 values of the random walk 
process {uy} using the formula ,, = Uy-ı + €,, (see the first cell of the table). 
Another 150 realizations of a white-noise process were drawn to represent the {n,,} 
sequence; given the initial condition , = 0, these realizations were used to con- 
struct {6,,} as 6,, = 0.58,,1 + Ny (see the next lower cell). Adding the two con- 
structed series yields 150 realizations for {y,}. To help ensure randomness, only the 
last 100 observations are used in the simulated study. 

The {z,} sequence was constructed in a similar fashion; the {e,,} and {1N,,} se- 
quences are each represented by two different sets of 150 random numbers. The 
trend {,,} and autoregressive irregular term {5,,} were constructed as shown in the 
second column of the table. The {8,,} sequence can be thought of as a pure irregu- 
lar component in the {z,} sequence. In order to introduce correlation between the 


Table 6.1 The Simulated Series 


y,} {z} {w,} 
Trend Hye = Myr- + Ey Ha = He + Ex Bove = Hyr + Mr 
Pure Irregular 6,,=0.58,.1.+1,, Ôu = 0.58 +12 Sa = 0.58 40-1 + Nr 


Series Y= Hy + By, 2 =Hy +6, +0.58, w, = pw, + Ôu + 0.56,, + 0.56, 


| 
{ 
| 
4 


{y,} and {z,} sequences, the irregular component in {z,} was constructed as the sum 
ô, + 0.58,,. In the third column, you can see that the trend in {w,} is the simple 
summation of the trends in the other two series. As such, the three series have the 
cointegrating vector (1, 1, —1). The irregular component in {w,} is the sum of pure 
innovation 6,,, and 50% of the innovations 6,, and 6,,. 

Now pretend that we do not know the data-generating process. The issue is 
whether the Engle-Granger methodology can uncover the essential details of the 
data-generating process. The first step is to pretest the variables in order to deter- 
mine their order of integration. Consider the augmented Dickey—Fuller regression 
equation for {y,}: 


n 
Ay, = Qo +O) );-1 + Fg A. +e, 


i=l 


If the data happened to be quarterly, it would be natural to perform the aug- 
mented Dickey—Fuller tests using lag lengths that are multiples of 4 (i.e., n = 4, 8, 
...). For each series, the results of the Dickey—Fuller test and augmented test using 
four lags are reported in Table 6.2. 

With 100 observations and a constant, the 95% critical value of the Dickey- 
Fuller test is -2.89. Since, in absolute value, all f-statistics are well below this criti- 
cal value, we cannot reject the null hypothesis of a unit root in any of the series. Of 
course, if there was any serious doubt about the presence of a unit root, we could 
use the procedures in Chapter 4 to (1) test for the presence of the constant term, (2) 
test for the presence of a deterministic trend, and/or (3) perform Phillips—Perron 
tests if the errors do not appear to be white-noise. If various lag lengths yield differ- 
ent results, we would want to test for the most appropriate lag length. 

The luxury of using simulated data is that we can avoid these potentially sticky 
problems and move on to Step 2.'* Since all three variables are presumed to be 
jointly determined, the long-run equilibrium regression can be estimated using ei- 
ther y,, z, or w, as the “left-hand-side” variable. The three estimates of the long-run 


Table 6.2 Estimated a, and the Associated t-statistic 


No lags 4 Lags 

Ay, -0.01995 -0.02691 =w # 
(-0.74157) (-1.0465) 
Az, —0.02069 -0.25841 
(-0.99213) (-1.1437) 
Aw, ~0.03501 -0.03747 
(-1.9078) (-1.9335) 


PREE A 


380 Multiequation Time-Series Models 
relationship (with t-values in parentheses) are 


yı = 0.4843 — 0.9273z, + 0.97687w, + e,, (6.37) 
(0.5751) (—38.095) (53.462) i 


z = 0.0589 — 1.0108y, + 1.02549w, + e., (6.38) 


(0.6709) (-38.095) (65.323) hype 
w, = 0.0852 + 0.9901y, + 0.953472, + €w : (6.39) 
(—1.0089) (52.462) (65.462) PLE 


where @,,, €,,, and e = the residuals from the three equilibrium regressions 


The essence of the test is to determine whether the residuals from the equilibrium 
regression are stationary. Again, in performing the test, there is no presumption that 
any one of the three residual series is preferable to any of the others. If we use each 
of the three series to estimate an equation in the form of (6.31) and (6.32), the esti- 
mated values of a, are as given in Table 6.3. 

Engle and Yoo (1987) report the critical values of the t-statistic as -3.93. Hence, 
using any one of the three equilibruim regressions, we can conclude that the series 
are cointegrated of order (1, 1). Fortunately, all three equilibrium regressions yield 
this same conclusion. We should be very wary of a result indicating that the vari- 
ables are cointegrated using one variable for the normalization, but are not cointe- 
grated using another variable for the normalization. This possible ambiguity is a 
weakness of the test; other methods can be tried if mixed results are found. 

Avoid the temptation to conduct significance tests in (6.37) through (6.39). The 
coefficients do not have asymptotic t-distributions unless the right hand side vari- 
ables are actually independent and there exists a single cointegrating vector. 

Step 3 entails estimating the error-correction model. Consider the first-order sys- 
tem shown with t-statistics in parentheses: 


Ay, = 0.009 + 0.441e,,-; + 0.190Ay,_; + 0.332Az,_; — 0.380AW,. + € (6.40) 
(0.291) (2.94) (1.15) (2.05) (-2.35) 
Az, = 0.042 + 0.054e,,; + 0.139Ay,_, + 0.253Az,_, — 0.304Aw,., + €, (6.41) 
(-1.11) (0.304) (0.711) (1.32) (1.59) 
Aw, = ~-0.041 — 0.065e,,,-; + 0.1S7Ay,., + 0.302Az,_, — 0.421AW,_; + €n, (6.42) 
(0.31) (0.907) ` (0.688) (1.35)  (=1.88) 


where ewi = W,_, + 0.0852 — 0.9901y,_, — 0.95347z,_; 


That is, €w; is the lagged value of the residual from (6.39). 

Equations (6.40) through (6.42) comprise a first-order VAR augmented with 
the single error-correction term e,,,_,. Again, there is an area of ambiguity since the 
residuals from any of the “equilibrium” relationships could have been used in 
the estimation. The signs of the speed of adjustment coefficients are in accord with 
convergence toward the long-run equilibrium. In response to a positive discrepancy 


Cotntegration and Purchasing-Power Parity 381 


Table 6.3 Estimated a, and the Associated t-statistic 


No lags 4 Lags 
Ae, -0.44301 -0.59525 
(-5.17489) (—4.0741) 
he, ~0.45195 ~0.59344 
(-5.37882) (4.2263) 
Ae, -0.45525 -0.60711 
(-5.3896) (-4.2247) 


in e,,,_1, both y, and z, tend to increase while w, tends to decrease. The error-correc- 
tion term, however, is significant only in (6.40). 

Finally, the diagnostic methods discussed in the last section should be applied to 
(6.40) through (6.42) in order to assess the model’s adequacy. If you use actual 
data, lag-length tests and the properties of the residuals need to be considered. 
Moreover, innovation accounting could help determine whether the model is ade- 
quate. These tests are not performed here since there is no economic theory associ- 
ated with the simulated data. 


6. COINTEGRATION AND PURCHASING-POWER PARITY 


Unfortunately, the simplicity of simulated data is rarely encountered in applied 
econometrics. To illustrate the Engle-Granger methodology using “real world” 
data, reconsider the theory of purchasing-power parity (PPP). Respectively, if e,, p*, 
and p, denote the logarithms of the price of foreign exchange, foreign price level, 
and domestic price level, long-run PPP requires that e, + p* ~ p, be stationary. The 
unit root tests reported in Chapter 4 indicate that real exchange rates—defined as r, 
= e, + př — p,—appear to be nonstationary. Cointegration offers an alterna- 
tive method to check the theory; if PPP holds, the sequence formed by the sum 
{e, + p*} should be cointegrated with the {p,} sequence. Call the constructed dollar 
value of the foreign price level f,; that is, f, = e, + p*. Long-run PPP asserts that 
there exists a linear combination of the form f, = By + Bip, + u, such that {1,} is sta- 
tionary and the cointegrating vector is such that B, = 1. 

As reported in Chapter 4, in Enders (1988), I used price and exchange rate data 
for Germany, Japan, Canada, and the United States for both the Bretton Woods 
(1960-1971) and post-Bretton Woods (1973-1988) periods.'* Pretesting the data 
indicated that for each period, both the U.S. price level (p,} and dollar values of the 
foreign price levels {e, + p*} both contained a single unit root. With differing or- 
ders of integration, it would have been possible to immediately conclude that long- 
run PPP failed. 


J0L AVE AEE rnewen ae 


The next step was to estimate the long-run equilibrium relation by regressing 
each f,=e,+ pF on p; 


fr=Bot Bure th St te ce 43) 


Absolute PPP asserts that f, = p, so that this version of the theory de Bo = 0 
à ` . . . . ring 
i tent with the relative version of PPP requi 

and B, = 1. The intercept Bp is consis sia sated ee 
i i ice levels move proportionately to ; 

only that domestic and foreign pri ; eee 

i to omit the constant, the recomme p 
Unless there are compelling reasons the ) 
tice is to include an intercept term in the equilibrium regression. In fact, Engle and 
Granger’s (1987) Monte Carlo simulations all include intercept terms. a 
The estimated values of B, and their associated standard errors are repo i 
Table 6.4. Note that five of the six values are estimated to be quite a bit be ie 
unity. Be especially careful not to make too much of these SR It is not este 
` ` . . . ty simp 
i significantly different from uni 
riate to conclude that each value of B, is om. j 
ae the values of (1 — B,) exceed two or three standard deviations. te 
sumptions underlying this type of t-test are not applicable nee unless the variables 
are actually cointegrated and p, is the independant ere eee 
i i tion, calle tb 
The residuals from each regression equation, } i Sena 
i ightforward since the residuals from a reg 
roots. The unit root tests are stralg ee 
i t have a time trend. The following two eq 

equation have a zero mean and do no owir ka 
fon were estimated using the residuals from each long-run equilibrium relation 


ship: 


a a 44 
Ap, = aiH + € 6 } 


and 


Afi, = aft + Za, AÊ- +E, a (6.45) 


Table 6.4 The Equilibrium Regressions 


Germany Japan Canada 
1973-1986 
Estimated B, 0.5374 0.8938 | se 
Standard error (0.0415) (0.0316) (0. 
1960-1971 
Estimated B, 0.6660 0.7361 ae 
Standard Error (0.0262) (0.0154) (0. 


Table 6.5 reports the estimated values of a, from (6.44) and (6.45) using a lag 
length of four. It bears repeating that failure to reject the null hypothesis a, = 0 
means we cannot reject the null of no cointegration. Alternatively, if -2 < a, <0, it 
is possible to conclude that the {Ñ,} sequence does not have a unit root and the {f,} 
and {p,} sequences are cointegrated. Also note that it is not appropriate to use ei- 
ther of the confidence intervals reported in Dickey and Fuller. The Dickey—Fuller 
Statistics are inappropriate because the residuals used in (6.44) and (6.45) are not 
the actual error terms. Rather, these residuals are estimated error terms that are ob- 
tained from the estimate of the equilibrium regression. If we knew the magnitudes 
of the actual errors in each period, we could use the Dickey—Fuller tables. 

Engle and Granger (1987) perform their own set of Monte Carlo experiments to 
construct confidence intervals for a, in (6.44) and (6.45). Under the null hypothesis 
hypothesis a, = 0, the critical values for the t-statistic depend on whether or not 


lags are appropriately included." The critical values at the 1, 5, and 10% signifi- 
cance levels are given by 


Critical Values for the Null of No Cointegration 


AS SSSR 


1% 5% 10% 
a ne O a 
No lags —4.07 ~3.37 -3.03 
Lags -3.73 -3.17 -2.91 


Comparing the results of Table 6.5 with the critical values provided by Engle 
and Granger indicates that for only Japan during the fixed exchange rate period is it 
possible to reject the null hypothesis of no cointegration. By using four lags, the 
t-statistic for the null a, = 0 is calculated to be —3.437. At the 5% significance level, 
the critical value of t is —3.17. Hence, at the 5% significance level, we can reject the 
null of no cointegration (i.e., accept the alternative that the variables are cointe- 
grated) and find in favor of PPP. For the other countries in each time period, we 
cannot reject the null hypothesis of no cointegration and must conclude that PPP 
generally failed. 

The third step in the methodology entails estimation of the error-correction 
model. Only the Japan/U.S. model needs estimation since it is the sole case for 
which cointegration holds. The final error-correction models for Japanese and U.S. 
price levels during the 1960 to 1971 period were estimated to be 


Af,= 0.00119 -0.10548f1,_, (6.46) 
(0.00044) (0.04184) 
Ap, = 0.00156 + 0.01114f1,, (6.47) 


(0.00033) (0.03175) 


where {h,_, is the lagged residual from the long-run equilibrium regression 


That is, fi,_, is the estimated value of f,, 


— Bo — Bip,-; and standard errors are in 
parentheses. 


384 Multiequation Time-Series Models 


Table 6.5 Dickey—Fuller Tests of the Residuals 


SPS re I EE SE ERIE ES ES I E 


Germany Japan Canada 

1973-1986 

no lags i 
Estimated a, í an 0.0225 i ~.015) l —0.100} 
Standard error : so (0.0169) = (0.0236) l (0.0360) 
t-Statistic for a, =0 ms 1331 “ -0.640 © -278l 

4 lags i ; + 
Estimated a, ' 0.0316 cn -0.0522 l -0.0983 
Standard error paso dS (0.0170) (0.0236) `>. (0.0388) 
t-Statistic fora =0 9.9 0% -1.859 > & -2.212 '  —2,533 
1960-1971 N : 

no lags a 
Estimated a, 0.0189 -0.1137 —0.0528 
Standard error (0.0196) (0.0449) (0.0286) 
t-Statistic fora, =O eee ye, 0.966 "eta 2,535 ~1.846 

faks Sada hae ee i : . 
Estimated a, f l -0.0294 8) -0.1821 —0.0509 
Standard error (0.0198) (0.0530) (0.0306) 
t-Statistic for a; =0 -1.468 -3.437 ~1.663 


Lag-length tests (see the discussion of x” and F-tests for lag lengths in the previ- 
ous chapter) indicated that lagged values of Af,; or Ap,_; did not need to be in- 
cluded in the error-correction equations. Note that the point estimates in (6.46) and 
(6.47) indicate a direct convergence to long-run equilibrium. For example, in the 
presence of a one-unit deviation from long-run PPP in period t — 1, the Japanese 
price level falls by 0.10548 units and the U.S. price level rises by 0.01114 units. 
Both these price changes in period ¢ act to eliminate the positive discrepancy from 
long-run PPP present in period t — 1. 

Notice the discrepancy between the magnitudes of the two speed of adjustment 
coefficients; in absolute value, the Japanese coefficient is approximately 10 times 
that of the U.S. coefficient. As compared to the Japanese price level, the U.S. price 
level responded only slightly to a deviation from PPP. Moreover, the error-correc- 
tion term is about} of a standard deviation from zero for the United States 
(0.01 114/0.03175 = 0.3509) and approximately 2.5 standard deviations from zero 
for Japan (0.10548/0.04184 = 2.5210). Hence, at the 5% significance level, we can 
conclude that the speed of adjustment term is insignificantly different from zero for 
the United States but not for Japan. This result is consistent with the idea that the 
United States was a large country relative to Japan—movements in U.S. prices 
evolved independently of events in Japan, but movements in exchange rate adjusted 
Japanese prices responded to events in the United States. 


pees dn citable et eet E 


CI MeO UUL, A, Ue CUE r wee 


7. CHARACTERISTIC ROOTS, RANK, AND 
COINTEGRATION 


Although the Engle and Granger (1987) procedure is easily implemented, it does 
have several important defects. The estimation of the long-run equilibrium regres- 


¿| ion requires that the researcher place one variable on the left-hand side and use the 
. others as regressors, For example, in the case of two variables, it is possible to run 


the Engle—Granger test for cointegration by using the residuals from either of the 
following two “equilibrium” regressions: 


ARSE SA wh y= Bio + Biz ter, ’ (6.48) 


` or 


* 2,= Boo + Bay, + Ezr “ (6.49) 


As the sample size grows infinitely large, asymptotic theory indicates that the 
test for a unit root in the {e,,} sequence becomes equivalent to the test for a unit 
root in the {e,,} sequence. Unfortunately, the large sample properties on which this 


result is derived may not be applicable to the sample sizes usually available to 
“ economists. In practice, it is possible to find that one regression indicates the vari- 
` ables are cointegrated, whereas reversing the order indicates no cointegration. This 


is a very undesirable feature of the procedure since the test for cointegration should 


_ be invariant to the choice of the variable selected for normalization. The problem is 
` obviously compounded using three or more variables since any of the variables can 


be selected as the left-hand-side variable. Moreover, in tests using three or more 
variables, we know that there may be more than one cointegrating vector. The 
method has no systematic procedure for the separate estimation of the multiple 


© cointegrating vectors. 


Another serious defect of the Engle-Granger procedure is that it relies on a two- 
step estimator. The first step is to generate the error series {é,} and the second step 
uses these generated errors to estimate a regression of the form Aê, = a,é,_, +. 
Thus, the coefficient a, is obtained by estimating a regression using the residuals 
from another regression. Hence, any error introduced by the researcher in Step | is 
carried into Step 2. Fortunately, several methods have been developed that avoid 
these problems. The Johansen (1988) and Stock and Watson (1988) maximum like- 
lihood estimators circumvent the use of two-step estimators and can estimate and 
test for the presence of multiple cointegrating vectors. Moreover, these tests allow 
the researcher to test restricted versions of the cointegrating vector(s) and speed of 
adjustment parameters. Often, we want to test a theory by drawing statistical infer- 
ences conceming the magnitudes of the estimated coefficients. 

Both the Johansen (1988) and Stock and Watson (1988) procedures rely heavily 
on relationship between the rank of a matrix and its characteristic roots. The 


Appendix to this chapter reviews the essentials of these concepts; those of you 


_ wanting more details should review this appendix. For those wanting an intuitive 


explanation, notice that the Johansen procedure is nothing more than a multivariate 
generalization of the Dickey—Fuller test. In the univariate case, it is possible to 
view the stationarity of {y,} as being dependent on the magnitude (a, — 1), that is, 


Y= aymi tE, 
or 
Ay, = (4, ~ Lyn + & 


If (a, — 1) =0, the {y,} process has a unit root. Ruling out the case in which {y,} 
is explosive, if (a, — 1) # 0 we can conclude that the {y,} sequence is stationary. 
The Dickey-Fuller tables provide the appropriate statistics to formally test the null 
hypothesis (a, — 1) = 0. Now consider the simple generalization to n variables; as in 
(6.26), let 


x, =Ai%1 + & 
so that 


Ax, = AXi Xm tE, 
= (å -DX +E, 
=x, +E, (6.50) 


where x, and e, are (n x 1) vectors 
A, = an (n Xn) matrix of parameters 
I = an(n xn) identity matrix 
and 7 is defined to be (A, — J). 


As indicated in the discussion surrounding (6.27), the rank of (A, — D equals the 
number of cointegrating vectors. By analogy to the univariate case, if (A, ~ /) con- 
sists of all zeros, so that rank(7) = 0, all the {Ax,,} sequences are unit root 
processes. Since there is no linear combination of the {x;} processes that is station- 
ary, the variables are not cointegrated. If we rule out characteristic roots that are 
greater than unity, if rank(m) = n, (6.50) represents a convergent system of differ- 
ence equations, so that all variables are stationary. 

There are several ways to generalize (6.50). The equation is easily modified to 
allow for the presence of a drift term; simply let 


Ax, = Áo + TX, 1 +E, (6.51) 


where A = a(n x 1) vector of constants (doi, @o2, - - - aon) 


| 
| 
| 
l 


The effect of including the various ag; is to allow for the possibility of a linear 
time trend in the data-generating process. You would want to include the drift term 
if the variables exhibited a decided tendency to increase or decrease. Here, the rank 
of x can be viewed as the number of cointegrating relationships existing in the “de- 
trended” data. In the long run, Tx, = 0 so that each {Ax,,} sequence has an ex- 
pected value of ax. Aggregating all such changes over t yields the deterministic ex- 
pression agl. 

Figure 6.3 illustrates the effects of including a drift in the data-generating 
process. Two random sequences with 100 observations each were generated; de- 
note these sequences as {¢€,,} and (e,,}. Initializing yy = zo = 0, we constructed the 
next 100 values of the {y,} and {z,} sequences as 


Ay, -0.2 0.2 || y ei 
Az, 0.2 0.2] a] Le, 


so that the cointegrating relationship is 
—0.2y,, + 0.2z,_, =0 


In the top graph (a) of Figure 6.3, you can see that each sequence resembles a 
random walk process and neither wanders too far from the other. The next graph 
(b) adds drift coefficients such that @,, = a, = 0.1; now, each series tends to in- 
crease by 0.1 in each period. In addition to the fact that each sequence shares the 
same stochastic trend, note that each has the same deterministic time trend also. 
The fact that each has the same deterministic trend is not a result of the equivalence 
between a, and a; the general solution to (6.51) necessitates that each have the 
same linear trend. For verification, the next graph (c) of Figure 6.3 sets aio = 0.1 
and a = 0.4. Again, the sequences have the same stochastic and deterministic 
trends. As an aside, note that increasing aj, and decreasing ajọ would have an am- 
biguous effect on the slope of the deterministic trend. This point will be important 
in a moment; by appropriately manipulating the elements of Ag, it is possible to in- 
clude a constant in the cointegrating vector(s) without imparting a deterministic 
time trend to the system. 

One way to include a constant in the cointegrating relationships is to restrict the 
values of the various a. For example, if m has a rank(m) = 1, the rows of z can dif- 
fer only by a scalar, so that it is possible to write each {Ax,,} sequence in (6.51) as 


Ax, = My Xp + yak ay H + My Xa + yy tE 
Axa, = S(T Xii + My aka) t+ MAXa) + aa + Ex 


Avtar = Sq(My Xp H My okay to + RAX] + Gag + Ene 


0 50 100 
(a) No drift or intercept. 


449 = 229 = O 
30 


20 


10 


-50 50 100 0 50 100 
(b) Drifts in the cointegrating (c) Drifts in the cointegrating 
relationship. relationship. 


azo = 499 = 0.1 ayo = 0.1 and azg = 0.4 


0 20 40 60 80 100 


ld} Intercept in cointegrating vector. 
a9 = 0.1 and a209 = -0.1 


a a a acoso 


ee ENEE E 


Characteristic Roots, Rank, and Cointegration 389 


where s,= scalars such that $; = T; 


If the ap can be restricted such that a = S4, it follows that all the Ax, can be 
written with the constant included in the cointegrating vector: 


Axi, = (My Xy E Myron te + My kat + io) + Ei 
Axa, = STM Xut + Myton to + Miner + Aig) + Ex 


AX in = SaR Xie + AXi to + Minky + Ayo) + Enr 
or in compact form, 


Ax, = T*xhy + &, (6552) 


where X, = (Xin Xan be Xn) 
xy = (Kipeas Xone ss ET 1y 
Mit Tiz t Rin Ao 
m T e n a 
ka 21 22 2n f 
Trt Ta e Ran Ano 


The interesting feature of (6.52) is that the linear trend is purged from the sys- 
tem. In essence, the various aj) have been altered in such a way that the general so- 
lution for each {x,,} does not contain a time trend. The solution to the set of differ- 
ence equations represented by (6.52) is such that all Ax,, are expected to equal zero 
When Ti Xy + Maka to + Ey Many tag = 0. 

To highlight the difference between (6.51) and (6.52), the last graph (d) of 
Figure 6.3 illustrates the consequences of setting a,) = 0.1 and a, = -0.1. You can 
see that neither sequence contains a deterministic trend. In fact, for the data shown 
in the figure, the trend will vanish so long as we select values of the drift terms 
maintaining the relationship aio = ~a. (You are asked to demonstrate this result in 
the Questions and Exercises section at the end of this chapter.) 

As with the augmented Dickey—Fuller test, the multivariate model can also be 
generalized to allow for a higher-order autoregressive process. Consider 


X, = AX t Akg t+ + ASX, + E, (6.53) 


ESSE 

Figure 6.3 Drifts and intercepts in cointegrating relationships. (a) No drift or intercept. (b) 
and (c) Drifts in the cointegrating relationship. (d) Intercept in the cointegrating 
vector. 


where x, = the (n X 1) vector (Xin Xr © ++» Xn) 
€, =is an independently and identically distributed n-dimensional vector 
with zero mean and variance matrix £,'* 


Equation (6.53) can be put in a more usable form by subtracting x,_, from each 
side to obtain 


Ax, = (Ay — Dx) + AX + Agtig +e + Apt + & 
Now add and subtract (A, — /)x,_2 to obtain 
Ax, = (A, — DAxy,_) + (42 + A1 — Dee + Asta ti tA pp t & 
Next add and subtract (A, + A, — /)x,_3 to obtain 
Ax, = (A, — DAx,1 + (Az tA, — DA 2 + (A3 + Az +A; — Ding to + Aptyp + & 


Continuing in this fashion, we obtain 


p-i 
Ax, = F TAx HRX,-p +E, 
i=l i (6.54) 


where : l= {r m > 4] 


i=] 


Again, the key feature to note in (6.54) is the rank of the matrix T; the rank of x l 
is equal to the number of independent cointegrating vectors. Clearly, if rank(z) =0, . 


the matrix is null and (6.54) is the usual VAR model in first differences. Instead, if 
r is of rank n, the vector process is stationary. In intermediate cases, if rank(r) = 1, 
there is a single cointegrating vector and the expression mx,_, is the error-correction 
factor. For other cases in which 1 < rank(x) < n, there are multiple cointegrating 
vectors. 

As detailed in the appendix, the number of distinct cointegrating vectors can be 
obtained by checking the significance of the characteristic roots of r. We know that 
the rank of a matrix is equal to the number of its characteristic roots that differ from 
zero. Suppose we obtained the matrix m and ordered the n characteristic roots such 
that A, >A, > = > Àp. If the variables in x, are not cointegrated, the rank of 7 is zero 
and all these characteristic roots will equal zero. Since In(1) = 0, each of the expres- 

sions in(1 — A,) will equal zero if the variables are not cointegrated. Similarly, if the 


j 
$ 


rank of z is unity, 0 <A, < 1 so that the first expression In(1 — A,) will be negative © 
and all the other A, = 0 so that In(1 — à) = In(1 — à) = -= =In(1 -A,) = 0. 

In practice, we can obtain only estimates of m and the characteristic roots. The 
test for the number of characteristic roots that are insignificantly different from : 
unity can be conducted using the following two test statistics: 


wA AE Neel?) a Si ind = ` (6.55) 


i=r+1 


Amnis (r, r+1) =-T Inl-A,,1) fs (6.56) 


where À; = the estimated values of the characteristic roots (also called eigenval- 
ues) obtained from the estimated n matrix 
T = the number of usable observations 


When the appropriate values of r are clear, these statistics are simply referred to as 
Niese and Aas 

The first statistic tests the null hypothesis that the number of distinct cointegrat- 
ing vectors is less than or equal to r against a general alternative. From the previous 
discussion, it should be clear that A,,..e equals zero when all À; = 0. The further the 
estimated characteristic roots are from zero, the more negative is In(1 — A,) and the 
larger the Ayace Statistic. The second statistic tests the null that the number of cointe- 
grating vectors is r against the alternative of r + 1 cointegrating vectors. Again, if 
the estimated value of the characteristic root is close to zero, Amay Will be small. 

Johansen and Juselius (1990) provide the critical values of the A,ace and Amay Sta- 
tistics obtained using simulation studies. The critical values are reproduced in Table 
B at the end of this text. The distribution of these statistics depends on: 


1. The number of nonstationary components under the null hypothesis (i.e., n — r). 


2. The form of the vector Ay. Use the middle portion of Table B if you do not in- 
clude a constant in the cointegrating vector or a drift term. Use the top portion of 
the table if you include the drift term Ay. Use the bottom portion of the table if 
you include a constant in the cointegrating vector. 


Using quarterly data for Denmark over the sample period 1974:1 to 1987:3, 
Johansen and Juselius (1990) let the x, vector be represented by 


x, = (m2, Yr ies iy 


where m2 = log of the real money supply as measured by M2 deflated by a price 
index 


y = log of real income 
deposit rate on money representing a direct return on money holding 
= bond rate representing the opportunity cost of holding money 


ti 


392 Multiequation Time-Series Models 


Including a constant in the cointegrating relationship (ie., augmenting x,_, with a 
constant), they report that the residuals from (6.54) appear to be serially uncorre- 
lated. The four characteristic roots of the estimated m matrix are given in the first 
column below." 


Noman 2—F In — 4,41) Arcace = -TE nl - 4) 


Â, = 0.4332 30.09 49.14 
> Â, =0.1776 10.36 , © 1905 
Ay = 0.1128 6.34 8.69 
Ag = 0.0434 2.35 . . 2.35 


The second column reports the various Anax statistics as the number of usable ob- 
servations (T = 53) multiplied by In(1 — 4,). For example, —53 In(1 — 0.0434) = 
2.35 and —53 In(1 — 0.1128) = 6.34. The last column reports the Auace Statistics as 
the summation of the Ama, Statistics. Simple arithmetic reveals that 8.69 = 2.35 + 
6.34 and 19.05 = 2.35 + 6.34 + 10.36. 

To test the null hypothesis r = 0 against the general alternative r = 1, 2, 3, or 4, 
use the Arce Statistic. Since the null hypothesis is r = O and there are four variables 
(i.e., n = 4), the summation in (6.55) runs from 1 to 4. If we sum over the four val- 
ues, the calculated value of Agace is 49.14. Since Johansen and Juselius (1990) in- 
clude the constant in the cointegrating vector, this calculated value of 49.14 is com- 
pared to the critical values reported in the bottom portion of Table B. For n — r = 4, 
the critical values of Arace are 49.925, 53.347, and 60.054 at the 90, 95, and 99% 
levels, respectively. Thus, at the 90% level, the restriction is not binding, so that the 
variables are not cointegrated using this test. 

To make a point and give you practice in using the table, suppose you want to 
test the null hypothesis r < 1 against the alternative r = 2, 3, or 4. Under this null 
hypothesis, the summation in (6.55) runs from 2 to 4, so that the calculated value of 
Araco iS 19.05. For n — r = 3, the critical values Of Auace are 32.093, 35.068, and 
40.198 at the 90, 95, and 99% levels, respectively. The restriction r= 0 or r= 1 is 
not binding. 

In contrast to the Ace Statistic, the Ama; Statistic has a specific alternative hy- 
pothesis. To test the null hypothesis r = 0 against the specific alternative r = 1, use 
Equation (6.56). The calculated value of the A,,,.(0, 1) statistic is -53 In(1 — 
0.4332) = 30.09. For n — r = 4, the critical values Of Amax are 25.611, 28.167, 
30.262, and 33.121 at the 90, 95, 97.5, and 99% levels, respectively. Hence, it is 
possible to reject the null hypothesis r = 0 at the 95% level (but not the 97.5% 
level) and conclude that there is only one cointegrating vector (i.e., r = 1). Before 
reading on, you should take a moment to examine the data and convince yourself 
that the null hypothesis r = 1 against the alternative r = 2 cannot be rejected at con- 

ventional levels. You should find that the calculated value of the Amax statistic for r 
= 1 is 10.36 and the critical value at the 90% level 19.796. Hence, there is no sig- 
nificant evidence of more than one cointegrating vector. 


Hypothesis Testing in a Cointegration Framework 393 


The example illustrates the important point that the results of the Ana, and À 
ma ‘trace 


tests can conflict. The i,,.x test has the sharper alternative hypothesis. It is usually 
preferred for trying to pin down the number of cointegrating vectors. 


8. HYPOTHESIS TESTING IN A COINTEGRATION 
FRAMEWORK 


In the Dickey-Fuller tests discussed in Chapter 4, it was important to correctly as- 
certain the form of the deterministic regressors. A similar situation applies in the 
Johansen procedure. As you can see in Table B, the critical values of the Aase and 
Amax Statistics are smallest with a drift term and largest with an intercept Gi in- 
cluded in the cointegrating vector. Instead of cavalierly positing the form of Ap, it is 
possible to test restricted forms of the vector. s 

One of the most interesting aspects of the Johansen procedure is that it allows for 
testing restricted forms of the cointegrating vector(s). In a money demand study 
you might want to test restrictions concerning the long-run proportionality between 
money and prices, the size of the income and interest rate elasticities of demand for 
money. In terms of Equation (6.1) (i.e., m, = Bo + Bip, + Boy, + Bar, + €,), the restric- 
tions of interest are: B, = 1, B, > 0, and B, < 0. 

The key insight to all such hypothesis tests is that if there are r cointegrating 
vectors, only these r linear combinations of the variables are stationary. All other 
linear combinations are nonstationary. Thus, suppose you reestimate the model re- 
stricting the parameters of n. If the restrictions are not binding, you should find that 
the number of cointegrating vectors has not diminished. 

To test for the presence of an intercept in the cointegrating vector as opposed to 
the unrestricted drift Ag, estimate the two forms of the model. Denote the ordered 
characteristic roots of unrestricted t matrix by Aa dos fetes in, and the characteristic 
roots of the model with the intercept(s) in the cointegrating vector(s) by as AS 
ces A*. Suppose that the unrestricted form of the model has r non-zero character 
istic roots. Asymptotically, the statistic: 


-TẸ [ind at) - Ind -A,)] (6.57) 


i=r+] 


has a x° distribution with (n — r) degrees of freedom. 

The intuition behind the test is that all values of In(1 — At) and In(] — A) should 
be equivalent if the restriction is not binding. Hence, small values for the test statis- 
tic imply that it is permissible to include the intercept in the cointegrating vector. 
However, the likelihood of finding a stationary linear combination of the n vari- 
ables is greater with the intercept in the cointegrating vector than if the intercept is 
absent from the cointegrating vector. Thus, a large value of A*,, [anda correspond- 


r+t 


ing large value of -T In(1 — A*,,)], implies that the restriction artifically inflates the 


number of cointegrating vectors. Thus, as proven by Johansen (1991), if the test 
- Statistic is sufficiently large, it is possible to reject the null hypothesis of an inter- 
cept in the cointegrating vector(s) and conclude that there is a linear trend in the 
variables. This is precisely the case represented by the middle portion of Figure 6.3. 

Johansen and Juselius (1990) test the restriction that their estimated Danish 
money demand function does not have a drift. Since they found only one cointe- 
grating vector among m2, y, if, and if, set n = 4 and r= 1. The calculated value of 
the x? statistic in (6.57) is 1.99. With three degrees of freedom, this is insignificant 
at conventional levels; they conclude that the data do not have a linear time trend, 
and find it appropriate to include the constant in the cointegrating vector. 

In order to test restrictions on the cointegrating vector, Johansen defines the two 
matrices & and B, both of dimension (n x r), where r is the rank of x. The properties 
of a and B are such that a ' 


m= af’ 


The matrix B is the matrix of cointegrating parameters, and the matrix o the ma- 
trix of weights with which each cointegrating vector enters the n equations of the ; 
VAR. In a sense, a can be viewed as the matrix of the speed of adjustment parame- : 
ters. Due to the cross-equation restrictions, it is not possible to estimate & and B us- | 
ing OLS.” However, with maximum likelihood estimation, it is possible to (1) esti- 
mate (6.53) as an error-correction model; (2) determine the rank of T; (3) use the r 
most significant cointegrating vectors to form $’; and (4) select œ such that m= ap’. 
Question 5 at the end of this chapter asks you to find several such a and B matri- 
ces. , 

It is easy to understand the process in the case of a single cointegrating vector. 
Given that rank(7) = 1, the rows of n are all linear multiples of each other. Hence, 
the equations in (6.54) have the form: 


Axi, =Z + Ti Xip + Ty 2Xo--p fee pe Ti Xnt—p + €y, 
AX, =. + S(T Xip + Ti 2X0; p “Rsk st M1 Xne-p) + €,, 


AX =o + sn, Xup t My 2Xap to + TX np) + En 


where the s, are scalars ! 


and for notational simplicity, the matrices z,Ax,_, have not been written out. | | 
Now define a, = (s,7,), so that each equation can be written as 


AX; Tr QiXip + Box» Fee BrXnt-p) +E; (i = L, Saca n) 


or in matrix form, 


p-l 
Ax, = Sin Ax,; +o8'x,_ 


i=l 


pre i (6.58) 


rE d 


where the single cointegrating vector is B = (1, Ba Ba... B,)’ and the speed of 
adjustment parameters are given by Q = (Qi, Qz... y- 


Once © and P’ are determined, testing various restrictions on a and B’ is straight- 
forward if you remember the fundamental point that if there are r cointegrating vec- 
tors, only these r linear combinations of the variables are stationary. Thus, the test 
Statistics involve comparing the number of cointegrating vectors under the null and 
alternative hypotheses. Again, let Ay, A: are i, and Ax, AE, ree Ax denote the or- 
dered characteristic roots of the unrestricted and restricted models, respectively. To 
test restrictions on B, form the test statistic: 


T$, (In(l—A#)- Ind = 4] (6.59) 


i=] 


Asymptotically, this statistic has a x? distribution with degrees of freedom equal 
to the number of restrictions placed on B. Small values of Ax relative to î, (fori <r) 
imply a reduced number of cointegrating vectors. Hence, the restriction embedded 
in the null hypothesis is binding if the calculated value of the test statistic exceeds 
that in a x? table. 

For example, Johansen and Juselius test the restriction that money and income 
move proportionally. Their estimated long-run equilibrium relationship is: 


m2,» = 1.03yp -5.21 iP, + 4.222, + 6.06 


They restrict the coefficient of income to be unity and find the restricted values 
of the A* to be such that 


re T In(1 — 4%) 
i=] 0.433 -30.04 
i=2 0.172 -10.01 
i=3 0.044 -2.36 
i=4 0.006 ~0.32 


Given that the unrestricted model has r = l and -T In(1 - À) = 30.09, (6.59) be- 
comes: —30.04 + 30.09 = 0.05. Since there is only 1 restriction imposed on B, the 
test statistic has a x? distribution with 1 degree of freedom. A x? table indicates that 
0.05 is not significant; hence, they conclude that the restriction is not binding. 

Restrictions on & can be tested in the same way. The procedure is to restrict & 
and compare the r most significant characteristic roots for the restricted and unre- 
stricted models using (6.59). If the calculated value of (6.59) exceeds that from a y 
table, with degrees of freedom equal to the number of restrictions placed on a, the 
restrictions can be rejected. For example, Johansen and Juselius (1990) test the re- 
striction that only money demand (i.e., m2,) responds to the deviation from long 
run equilibrium. Formally, they test the restriction that O =O, = Qu = 0. Restricting 


396 Multiequation Time-Series Models 


the three values of q, to equal zero, they find the largest characteristic root in the 
restricted model is such that T In(1 — A*) = -23.42. Since the unrestricted model is 
such that T In(1 - i) = —30.09, equation (6.59) becomes —23.42 — (—30.09) = 7.67. 
The y? statistic with 3 degrees of freedom is 7.81 at the 5% significance level. 
Hence, they find mild support for the hypothesis that the restriction is not binding. 

If there is a single cointegrating vector, the Engle-Granger and Johansen meth- 
ods have the same asymptotic distribution. If it can be determined that only one 
cointegrating vector exists, it is common to rely on the estimated error-correction 
model to test restrictions on a. If r= 1, and a single value of æ is being tested, the 
usual f-statistic is asymptotically equivalent to the Johansen test. 


9. ILLUSTRATING THE JOHANSEN METHODOLOGY 


An interesting way to illustrate the Johansen methodology is to use exactly the 
same data shown in Figure 6.2. Although the Engle—Granger technique did find 
that the simulated data were cointegrated, a comparison of the two procedures is 
useful. Use the following four steps when implementing the Johansen procedure. 


STEP 1: Pretests and lag length. It is good practice to pretest all variables to assess 
their order of integration. Plot the data to see if a linear time trend is likely 
to be present in the data-generating process. Although forms of the 
Johansen tests can detect differing orders of integration, it is wise not to 
mix variables with different orders of integration. 

The results of the test can be quite sensitive to the lag length so it is im- 

- portant to be careful. The most common procedure is to estimate a vector 

autoregression using the undifferenced data. Then use the same lag-length 

tests as in a traditional VAR. Begin with the longest lag length deemed 

reasonable and test whether the lag length can be shortened. For example, 

if we want to test whether lags 2 through 4 are important, we can estimate 
the following two VARs: 


X, = Ag HA Xii + Arto tA 3 + Agha t Er (6.60) 
X, = Áo tAiX,1 + Cx (6.61) 


‘where x, = the (n X 1) vector of variables 
Áo = (n X 1) matrix of intercept terms 
A; = (n Xn) matrices of coefficients 


ei and e, = (nx 1) vector of error terms 


Estimate (6.60) with four lags of each variable in each equation and call 
the variance/covariance matrix of residuals £,. Now estimate (6.61) using 
only one lag of each variable in each equation and call the variance/covari- 
ance matrix of residuals Z,. Even though we are working with nonstation- 


STEP 2: 


illustrating the Johansen Methodology 397 


ary variables, the likelihood ratio test statistic recommended by Sims’ 
(1980) is the same as that reported in Chapter 5: 


(T- clog |, | - log |=, 1) 


where T = number of observations 
c = number of parameters in the unrestricted system 
log | z] = natural logarithm of the determinant of &,. 


Following Sims, use the %? distrubution with degrees of freedom equal 
to the number of coefficient restrictions. Since each A, has n? coefficients, 
constraining A, = A; = A, = 0 entails 37? restrictions. Alternately, you can 
select lag length p using the multivariate generalizations of the AIC or 
SBC. 


Estimate the model and determine the rank of z. Many time-series statisti- 
cal software packages contain a routine to estimate the model. Here, it suf- 
fices to say that OLS is not appropriate since it is necessary to impose 
cross-equation restrictions on the m matrix. You may choose to estimate 
the model in three forms: (1) with all elements of A, set equal to zero, (2) 
with a drift, or (3) with a constant term in the cointegrating vector. 

With the simulated data in Figure 6.2 such that x, = (y, Z} w,), an inter- 
cept term in the cointegrating vector(s) was included even though the data- 
generating process did not contain an intercept. As we saw in the last sec- 
tion, it is possible to test for the presence of the intercept. Lag-length tests 
indicate setting p = 2, so that the estimated form of the model is 


Ax, = Ag + RAX, 1 + My» +e, (6.62) 


where the drift term Ay was constrained so as to force the intercept to ap- 
pear in the cointegrating vector. 


As always, carefully analyze the properties of the residuals of the esti- 
mated model. Any evidence that the errors are not white-noise usually 
means that lag lengths are too short. Figure 6.4 shows the deviations of y, 
from the long run equilibrium relationship (u, = —0.01331 ~ 1.0000y, - 
1.0350z, + 1.0162w,) and one of the short-run error sequences [i.e., the 
{e,,} sequence that equals the residuals from the y, equation in (6.62)). 
Both sequences conform to their theoretical properties in that the residuals 
from the long-run equilibrium relationship appear to be stationary and the 
estimated values of the {€,,} sequence approximate a white-noise process. 

The estimated values of the characteristic roots of the m matrix in (6.62) 
are 


A, =0.32600, A,=0.14032, and A, = 0.033168 


Figure 6.4 Long- and short-run errors. 
1 


0.5 


— Long-run error — Residual 


Since T = 98 (100 observations less the two lost as a result of using two 
lags), the calculated values of Ana, and Ayace for the various possible val- 
ues of r are reported in the center column of Table 6.6. 

Consider the hypothesis that the variables are not-cointegrated (so that 
the rank z = 0). Depending on the alternative hypothesis, there are two 
possible test statistics to use. If we are simply interested in the hypothesis 
that the variables are not cointegrated (r = 0) against the alternative of one 
Or more cointegrating vectors (r > 0), we can calculate the A,,,.-(0) statis- 
tic: 


Mrrace(O) = -TIn — 44) + In(1 — 22) + In(d — As)] 
= —98 [In(1 — 0.326) + In(1 — 0.14032) + In(1 — 0.033168)} 
= 56.786 


Table 6.6 The Xmax ANd Ayace Tests 


Alternative 95% 90% 

Hypothesis Hypothesis Critical Value Critical Value 
Arrace tests : Arace value - 

r>0 56.786 ` 35.068 32.093 

r>1 4 18.123 : 20.168 17,957 

r>2 = 3.306 9.094 7.563 

Amax value 

r=1 38.663 21.894 19.796 

r=2 14.817 15.252 "13.781 

r=3 3.306 9.094 7.563 


Since 56.786 exceeds the 95% critical value of the Apace Statistic (in the 
bottom portion of Table B, the critical value is 35.068), it is possible to re- 
ject the null hypothesis of no cointegrating vectors and accept the alterna- 
tive of one or more cointegrating vectors. Next, we can use the Arace(1) 
Statistic to test the null of r < I against the alternative of two or three coin- 
tegrating vectors. In this case, the A,_,..(1) statistic is 


Sae Preace(1) = -TUC = Ap) + In(1 ~ 25)] a 
a, = -98 [In (1 — 0.14032) + In(1 ~ 0.033168] — 
= 18.123 Asus s 


Since 18.123 is less than the 95% critical value of 20.168, we cannot re- 
ject the null hypothesis at this significance level. However, 18.123 does 
exceed the 90% critical value of 17.957; some researchers might reject the 
null and accept the alternative of two or three cointegrating vectors. The 
Arrace(2) statistic indicates no more than two cointegrating vectors at the 
90% level significance level. 

The Ajax Statistic does not help to clarify the issue. The null hypothesis 
of no cointegrating vectors (r = 0) against the specific alternative r = 1 is 
clearly rejected. The calculated value A,,,,(0, 1) = -98 In(1 - 0.326) = 
38.663 exceeds the 95% critical value of 21.894. Note that the test of the 
null hypothesis r = | against the specific alternative r = 2 cannot be re- 


. jected at the 95% level, but can be rejected at the 90% level. The calcu- 


lated value of A,,,,(1, 2) is -98 In(1 — 0.14032) = 14.817, whereas the crit- 
ical values at the 95 and 90% significance levels are 15.752 and 13.781, 
respectively. Even though the actual data-generating process contains only 


~ one cointegrating vector, the realizations are such that researchers willing 


to use the 90% significance level would incorrectly conclude that there are 


' two cointegrating vectors. Failing to reject an incorrect null hypothesis is 
* always a danger of using wide confidence intervals. 


Analyze the normalized cointegrating vector(s) and speed of adjustment 
coefficients. If we select r= 1, the estimated cointegrating vector (By B, B2 
B3) is 

B = (0.00553 0.41532 0.42988 -0.42207) 


If we normalize with respect to B,, the normalized cointegrating vector 


“1 and speed of adjustment parameters are 


B = (0.01331 -1,0000 -1.0350 1.0162) 
a, = 0.54627 

a, = 0.16578 
©, = 0.21895 


i ae eG eA ek Set Oo e 


Dawe ae aati ee 


400 Multiequation Time-Series Models 


Recall that the data were constructed imposing the long-run relationship 
w, = y, + z; so that the estimated coefficients of the normalized B vector 
are close to their theoretical values of (0, —1, —1, 1). Consider the follow- 
ing tests: 


1. The test that By = 0 entails one restriction on one cointegrating vector; 
hence, the likelihood ratio test has a x? distribution with one degree of 
freedom. The calculated value of %? = 0.011234 is not significant at 
conventional levels. Hence, we cannot reject the null hypothesis that Bo 
= 0. Thus, it is possible to use the form of the model in which there is 
neither a drift nor an intercept in the cointegrating vector. Thus, to clar- 
ify the issue concerning the number of cointegrating vectors, it would 
be wise to reestimate the model excluding the constant from the cointe- 
grating vector. 


2. To restrict the normalized cointegrating vector such that B, =—1 and B, 
= | entails two restrictions on one cointegrating vector; hence, the like- 
lihood ratio test has a y? distribution with two degrees of freedom. The 
calculated value of x? = 0.55350 is not significant at conventional lev- 
els. Hence, we cannot reject the null hypothesis that B, = —1 and B, = 1. 

.3. To test the joint restriction B = (0, —1, —1, 1) entails the three restric- 

` tions By = 0, By =—1, and B; = 1. The calculated value of x? with three 
degrees of freedom is 1.8128, so that the significance level is 0.612. 
Hence, we cannot reject the null hypothesis that the cointegrating vec- 
tor is = (0, -1, —1, 1). 


STEP 4: Innovation accounting. Finally, innovation accounting and causality tests 
on the error-correction model of (6.62) could help to identify a structural 
model and determine whether the estimated model appears to be reason- 
able. Since the simulated data have no economic meaning, innovation ac- 
counting is not performed here. 


10. GENERALIZED PURCHASING-POWER PARITY 


Most studies of purchasing-power parity (PPP) find the theory inadequate to ex- 
plain price and exchange rate movements for low inflation countries during the 
post-World War II period. The theory of generalized purchasing-power parity 
(G-PPP) was developed in Enders and Hurn (1994) to explain the observed nonsta- 
tionarity of real exchange rate behavior. The idea is that traditional PPP can fail be- 
cause the fundamental macroeconomic variables that determine real exchange 
rates—such as real output levels and expenditure patterns—are nonstationary; thus, 
the real rates themselves will tend to be nonstationary.”! Although bilateral real ex- 
change rates are generally nonstationary, G-PPP hypothesizes that they will exhibit 


Generalized Purchasing-Power Parity 401 


common stochastic trends if the fundamental variables (i.e, the forcing variables) 
are sufficiently interrelated. 

G-PPP can be interpreted in terms of optimum currency areas. In the two-country 
case, the real exchange rate between the two countries comprising the domain of a 
currency area should be stationary. In a multicountry setting, within an appropri- 
ately defined currency area, the forcing variables will be sufficiently interrelated, so 
that the real exchange rates themselves will share common trends. Hence, within a 
currency area we would expect there to be at least one linear combination of the 
various bilateral real exchange rates that is stationary. 

To test the theory, we obtained wholesale prices and exchange rates from the 
IMF data tapes over the period January 1973 to December 1989 for Australia, 
Germany, India, Indonesia, Japan, Korea, Philippines, Singapore, Thailand, the 
U.K., and the United States.’ The real exchange rate series were constructed using 
Japan as the base country; for each country, we defined the real bilateral exchange 
rate with Japan to be the log of the domestic WPI plus the log of the domestic cur- 
rency price of the yen minus the log of the Japanese WPI. All were then normal- 
ized, so that the real rates in January 1973 are all equal to zero (for Indonesia, 
January 1974 = 0). If we use augmented Dickey—Fuller (1979, 1981) and Phillips- 
Perron (1988) tests with 12 lags (since monthly data are used), it is not possible to 
reject a null of a unit root at conventional significance levels for any of the series. 
These findings are hardly surprising; they simply confirm what other studies have 
concluded about the nonstationarity of real exchange rates in the post-Bretton 
Woods period. You can use the data contained in the file REALRATE.PRN along 
with the discussion below. 

In accord with G-PPP, suppose that m of the countries in an n-country world 
comprise the domain of a currency area; for these m countries, there exists a long- 
run equilibrium relationship between the m — 1 bilateral real rates such that: 


Tizi = Bo + Bislise + Brai +e + Bim imi t e, (6.63) 


where the r,,, =the bilateral real exchange rates in period t between country | 
(Japan in our empirical estimations) and country i 


Bo = an intercept term 
Bi; = the parameters of the cointegrating vector 
e, = a stationary stochastic disturbance term 


For the special case in which all the B,; are zero, Equation (6.63) becomes the fa- 
miliar PPP relationship between domestic prices, foreign prices, and the exchange 
rate. 


Empirical Tests 


Our first step is to consider whether there exists a cointegrating vector between the 
three real rates for Germany, the U.K., and the United States. Using Japan as the 
base country, we calculate the following values for the A,,,.¢ and Ama, tests: 


au. MULQEGUALION £ Une -OE ice Vut 


Hy i hna 
r=2 3.32 3.32 
r=] 12.74 9.42 
r=0 28.95 16.21 


Using the X race test, we Cannot reject the null hypothesis that r = 0. The calcu- 
lated value of 28.95 is less than the 90% critical value of 32.093. If we use the more 
specific Ax test, a null of r = 0 against the alternative of r = 1 cannot be rejected at 
the 80% level of significance (the critical value being 17.474 at the 80% signifi- 
cance level). Thus, the three real exchange rates are not cointegrated; G-PPP does 
not hold among these countries, so it is possible to conclude that these four coun- 
tries do not comprise a currency area. 

Still using Japan as the base country, we next examine whether there exist coin- 
tegrating relationships among the German, U.K., and U.S. rates with the rates of 
other Pacific Rim nations. Consider the following four-variable equation: 


Fii = Bo + Biaise + Biatias + Bistis: + e, (6.64) 
where ij, riz ris and r,s, refer respectively, to the logarithms of the bilateral real 
exchange rates of country i, the United States, Germany, and the U.K.” 


For each of the seven countries listed in Table 6.7,.the A,,.-e(0) statistic is re- 
ported in column 2. With 4 variables, at the 95% level, the critical value of Arase is 
53.347. For all countries except India, we can reject a null of no cointegration. If 
we examine India in more detail, the A,,,, Statistic for the null of r = 0 against the 
alternative r = 1 cannot be rejected at the 90% level. Therefore, we conclude that 
G-PPP does not hold for India. However, G-PPP does hold for each of the other - 
Pacific Rim countries with Germany, Japan, the U.K., and the United States. Since -3 
G-PPP does not hold between Japan, the United States, Germany, and the U.K. :3 
alone, the natural interpretation is that the real exchange rate of each of the smaller 
Pacific Rim nations (except India) follows a time path dictated by events in the 
larger countries. 


Table 6.7 Values of Arac for r=0 


Aerace Bis Bis Bis a 
Australia 60.35 0.202 0.586 —-0,549 —0.07 
India 46.49 1.436 0.985 1.302 0.02 . 
Indonesia $6.93 1.513 1.390 1.750 —0.04 
Korea 63.11 —0.497 1.443 —0.995 —0.05 
Philippines 56.91 0.720 —0.352 0.253 —0.47 
Thailand 64.25 0.986 0.893 0.383 0.04 
Singapore 55.44 1.173 0.681 0.638 0.066 


The interrelationships among the various real exchange rates are reflected by the 


coefficients of the equilibrium relationship reported in columns 3 to 5 of Table 6.7. ~ Sanh 
The straightforward interpretation of the various B,, are as long-run elasticities. For... 


example, the Australian bilateral real rate with Japan changes by 0.202% in re- 
sponse to a 1% change in the U.S./Japanese bilateral real exchange rate. Notice that 
the absolute values of the B, are generally quite large; only five of the 21 estimated 
coefficients are less than 0.5 in absolute value. 

The sixth column of Table 6.7 reports the weights or “speed of adjustment” coef- 
ficients with which a discrepancy from G-PPP affects the real rate between country 
i and Japan. The speed of adjustment coefficients for the large countries are not sig- 
nificant and not shown in the table. Note that for all countries except the Phil- 
ippines, the speed of adjustment coefficients are rather small; thus, any deviation 
from G-PPP can be expected to persist for a relatively long period of time.” 


The Australia, Korea, Philippines Group 


Since we had reason to believe that the rates for Australia, Korea, and the 
Philippines are interrelated, it is interesting to examine this group in greater detail. 
Letting rau, rko, and rph denote the logs of the Australian, Korean, and Philippine 


real bilateral exchange rates with Japan, we estimated the following long-run equi- ~~ 


librium relation: 


11.62 rau — 6.65 rko ~ 9.58 rph + 3.152 =0 (6.65) 
or normalizing with respect to the Austrialian real rate, we get 
rau, = 0.572 rko, + 0.825 rph, — 0.271 (6.66) 


In the formal tests for cointegration, the calculated yace test statistic for the null 
r = 0 equals 39.95; this null can be strongly rejected at the 99% significance level. 
Moreover, both the Arace and Amax tests indicate that this cointegrating vector is 
unique (so that r= 1). 

The Johansen procedure allows us to test restrictions on the cointegrating vector. 
We tested the following restrictions on equation (6.66): 


H,: The coefficients on rko and rph sum to unity 


If the sum of these two coefficients is equal to unity, (6.66) can be rewritten solely 
in terms of the Australian bilateral rate with Korea and Korean bilateral rate with 
the Philippines. The calculated x? statistic is 11.36; with one degree of freedom 
(since r= 1 and n —s = 1), X3 o1 = 6.63 and we reject the restriction; thus, we can re- 
ject the hypothesis that the Japanese price level does not enter into Equation (6.65). 


Hz: Zero restrictions 


Restricting the coefficient on rko to equal zero yields a y? value of 7.90; restricting 
for the coefficient on rph yields a x? value of 12.94. Again, with one degree of free- 
dom, we reject the restriction at the 1% significance level. 


404 Multiequation Lime-sertes Models 


H,: Equality restrictions 


-Restricting the coefficients on rko and rph to be equal yields a x° statistic of 4.83. 
We can reject the restriction at the 5% (but not the 1%) significance level since 
13.05 = 3.84. The restricted cointegration vector becomes: 


rau = 0.653(rko + rph) — 0.271 (6.67) 


Certainly, there is strong evidence that G-PPP holds among this subset of coun- 
tries. The question is whether Australia, Korea, and the Philippines as a group form 
their own currency area with Japan. Next, we compare the residual variances of rau, 
rko, and rph when estimated in the system given by (6.64) versus the residual vari- 
ances when the rates are estimated by Equation (6.65). 


Variances of Residuals 


Equation 6.64 Equation 6.65 
rau 0.00105 0.00114 
rko 0.00066 0.00073 
rph fy 0.00087 5 « 0.00105 


Notice that for each of the three real rates, the residuals have the smallest vari- 
ance when estimated as in Equation (6.64). Thus, for Australia, Korea, and the 
Philippines, real exchange rate movements are more heavily influenced by 
Germany, Japan, the U.K., and the United States than each other. Since these three 
countries are the most likely of the Pacific Rim nations to constitute a currency 
area, there is little evidence that any subgroup of Pacific Rim nations constitutes a 
currency area. Rather, each Pacific nation has its own real rate influenced by the set 
of the larger nations. 


SUMMARY AND CONCLUSIONS 


Many economic theories imply that a linear combination of certain nonstationary 
variables must be stationary. For example, if the variables {x,,}, {Xa}, and {x ,} are 
I(1) and the linear combination e, = Bo + Bix1, + Baxar + Baxa: is stationary, the vari- 
ables are said to be cointegrated of order (1, 1). The vector (Bo, Bi» Ba, Ba) is called 
the cointegrating vector. Cointegrated variables have the same stochastic trends and 
so cannot drift too far apart. Cointegrated variables have an error-correction repre- 
sentation such that each responds to the deviation from “long-run equilibrium.” 
One way to check for cointegration is to examine the residuals from the long-run 
equilibrium relationship. If these residuals have a unit root, the variables cannot be 
cointegrated of order (1, 1). Another way to check for cointegration among I(1) 


OUMUMALY UNG LONCLUSLONS 4UD 


variables is to estimate a VAR in first differences and include the lagged level of 
the variables in some period t ~ p. If we use a multivariate generalization of the 
Dickey—Fuller test, the vector can be checked for the presence of unit roots. In an n 
equation system, n minus the number of unit roots equals the number of cointegrat- 
ing vectors. 

The Arace and Àmax test statistics can be used to help determine the number of 
cointegrating vectors. These tests are sensitive to the presence of the deterministic 
regressors included in the cointegrating vector(s). Restrictions on the cointegrating 
vector(s) and/or speed of adjustment parameters can be tested using y? statistics. 

The Johansen and Juselius tables are extended to allow for more than five vari- 
ables in Osterwald—Lenum (1992). Also, there is a growing body of work consider- 
ing hypothesis testing in a cointegration framework. Park (1992) develops a 
non-parametric method for the estimation and testing of cointegrating vectors. 
Johansen and Juselius (1992) and Horvath and Watson (1993) discuss the testing of 
structural hypotheses within a cointegration framework. A useful review of the hy- 
pothesis testing is provided by Johansen (1991). 

The literature is proceeding in several interesting directions. Friedman and 
Kuttner (1992) use cointegration tests to show that significant relationships be- 
tween money, income, and interest rates break down in the 1980s. The paper makes 
an excellent companion piece with this chapter since it also uses innovation ac- 
counting techniques. Another interesting money demand study using the techniques 
in this chapter is Baba, Hendry, and Starr (1992). Gregory and Hansen (1992) con- 
sider the possibility of a structural break in a cointegrated system. The intercept 
and/or slope coefficients of the cointegrating vector are allowed to experience a 
regime shift at an unknown date. King et al. (1991) combine cointegration tests 
with the type of structural decompositions considered in Chapter 5. 


QUESTIONS AND EXERCISES 
1. Let Equations (6.14) and (6.15) contain intercept terms such that 


Y= Qio + Ay V1 Fae + Ey and Zp = Ang + A21Yi-1 + Az22-1 + Ey 


A. Show that the solution for y, can be written as 


Y= [CL ~ Aaah ey, + (1 = zy) i + Gyghe,, + a12820] [C1 - a L)(P = aah) 
— apa L] 


B. Find the solution for z,. 


C. Suppose that y, and z, are CI(1, 1). Use the conditions in (6.19), (6.20), and 
(6.21) to write the error-correcting model. Compare your answer to (6.22) 
and (6.23). Show that the error-correction model contains an intercept term. 


D. Show that {y,} and {z,} have the same deterministic time trend (i.e., show 
that the slope coefficient of the time trends is identical). 


E. What is the condition such that the slope of the trend is zero? Show that this 
condition is such that the constant can be included in the cointegrating vec- 
tor. 


2. The data file COINT6.PRN contains the three simulated series used in Sections 
5 and 9. You should find that the properties of the data are such that 


Standard 
Series Observations Mean Error Minimum Maximum 
Y 100 —4,.2810736793  1.4148612773 -6.3307043375 ~1.2512548288 
A 100 —2,1437335637 1.7951179043 ~5.7040632238 0.6257029853 
W 100 —6.3677952867 2.3914380011 —9.6848404427 —1.4460513399 


A. Use the data to reproduce the results in Section 5 


B. Use the data to reproduce the results in Section 9. 


3. The data file REALRATE.PRN contains the real exchange rate series used in 
Section 10. Use the series to reproduce the results in Section 10. 


4. The second, third, and fourth columns of the file labeled US.PRN contain the in- 
terest rates paid on U.S. 3-month, 3-year, and 10-year U.S. government securi- 
ties. The data run from 1960:Q1 to 1991:Q4. These columns are labeled TBILL, 
r3, and r10, respectively. You should find that the properties of the data are such 


that 
Standard 
Series Observations Mean Error Minimum Maximum 
TBILL 128 6.3959 2.7915 2.3200 15.0900 
r3 128 7.3666 2.8113 3.3700 15.7900 
rlo 128 7.6299 2.7627 3.7900 14.8500 


A. Pretest the variables to show that all the rates act as unit root processes. 
Specifically, perform augmented Dickey—Fuller tests with 1, 4, and 8 lags. 
You should obtain 


Series Statistic Sample Observations Without Trend With Trend 

TBILL ADF(1) 60Q3 9104 126 ~2.3007(—2.8844) —2.2850(-3.4458) 
ADF(4) 61Q2 91Q4 123 —2.2112(-2.8849) —2.0101(-3.4466) 
ADF(8) 62Q2 91Q4 119 ~2.0913(—2.8857) —1.8901(-3.4478) 


Series 


r3 


rlo 


Statistic 
ADF(1) 
ADF(4) 
ADF(8) 
ADF(1) 
ADF(4) 
ADF(8) 


Sample 
60Q3 91Q4 
6192 9104 
62Q2 91Q4 
60Q3 91Q4 
61Q2 91Q4 
62Q2 91Q4 


Observations 


Without Trend 


With ‘Trend 


126 
123 
119 
126 
123 
119 


95% critical values appear in brackets. 


—1.8902(~2.8844) 
~1.9902(—2.8849) 
—1.6772(-2.8857) 
—1.6974(-2.8844) 
—1.9028(—2.8849) 
~1.5170(-2.8857) 


—1.7706(-3.4458) 
—1.6882(-3.4466) 
—1.1362(-3.4478) 
~1.5642(-3.4458) 
—1.8007(-3.4466) 
—.89269(-3.4478) 


B. Estimate the cointegrating relationships using the Engle—Granger procedure. 
Perform augmented Dickey—Fuller tests on the residuals. Using TBILL as 


the “dependent” variable, you should find 


TBILL, = 0.050882 + 2.253573, ~ 1.3441r10, 


(0.35017) (21.3184) (-12.4961) 


where  t-Statistics are in parentheses. 


Unit root tests for residuals 


Statistic 
ADF(1) 
ADF(4) 
ADF(8) 
ADF(12) 


Sample 

60Q3 91Q4 
61Q2 9104 
62Q2 9104 
63Q2 91Q4 


Value 

~5.3486 
—4.5669 
-3.4573 
-3.0687 


The 95% critical value is about 3.81. Based on this data, do you conclude that 
the variables are cointegrated? 


C. Repeat part B using r10 as the “dependent” variable. You should find that 


Unit root tests for residuals 


ADF(1) 
ADF(4) 
ADF(8) 
ADF(12) 


Sample 

60Q3 91Q4 
61Q2 91Q4 
62Q2 91Q4 
63Q2 91Q4 


Value 


—4,9209 


-3.33 


—2.1910 
-1.4109 


D. Estimate an error-correcting model using only one lag of each variable. For 
the TBILL equation, you should find 


ATBILL, = 0.011346 + 0.24772é,_, — 0.1SS98ATBILL,, + 0.73044Ar3,_, 
— 0.48743Ar10,_, + Erbi 
where ê, is the lagged residual from your estimate in part B. 


Diagnose the problems with this regression equation. You should find 
i. All coefficients are insignificant. =e 
it. The {€pp:1,, Sequence exhibits serial correlation. 

ili. Large volatility of the residuals in the early 1980s. 


How would you attempt to correct these problems? 


E. Estimate the model using the Johansen procedure. Use four lags and include 
an intercept in the cointegrating vector. You should find that 


List of characteristic roots (i-e., eigenvalues) in descending order: 


0.15307 0.10840 0.031092 


Trace Tests Maximum Eigenvalue Tests 

Null Alternative Arte Null Alternative Anax 
r=0 r21 38.7453 r=0 r=1 , 20.6006 
rsl r22 18.1447 r=1 4 r=2 14.2280 
r<2 r=3 3.9167 PRD 0 r=3 3.9167 


i. Explain why the Arce test strongly suggests that there is exactly one 
cointegrating vector. 


test? 


iii. Explain why there may be a discrepancy in the results. 


ii. To what extent is this result reinforced by the A 


max 


F. Given that there is one cointegrating vector, verify that the normalized coin- 
tegrating vector is i 


TBILL, = 1.889273, — 0.95116r10, — 0.27438 


i. Compare this result to your answer in part C. 


ii. Show that the speed of adjustment parameters for the normalized TBILL, 
r3, and r10 equations are 


TBILL: —0.096246 r3: —0.38181 r10: 0.3538 


iii. What do the negative signs imply about the adjustment process? 


G. Test the restriction By) = 0. You should find that the estimated cointegrating 
vector is 


TBILL, = 1.945973, — 1.0376r10, 


and the x’ statistic with one degree of freedom is 0.80839. 


H. Estimate the model assuming that there is a drift. You should find that the 
characteristic roots are 


0.15298 0.10619 0.025545 
Given that r= 1, verify that the summation indicated by (6.57) yields 


—124[In(1 — 0.10840) + In(1 — 0.031092) — In(1 ~ 0.10619) 
— ln(1 - 0.025545)] =1.01 


Do you conclude there is a drift term? 


. Suppose you estimate 7 to be: 


0.6 -05 0.2 
m=/0.3 -0.25 0.1 
1.2 -1.0 04 


A. Show that the determinant of 7 is zero. 


B. Show that two of the characteristic roots are zero and that the third is 0.75. 
C. Let B’ = (3 - 2.5 1) be the single cointegrating vector normalized with re- 


spect to x3. Find the (3 x 1) vector a such that n = aß’. How would a change 
if you normalized B with respect to x,? 


D. Describe how you could test the restriction B, + B, = 0. 
Now suppose you estimate 1 to be: 


08 04 0.0 
m={/O1 Of 0.0 
0.75 0.25 0.5 


E. Show that the three characteristic roots are 0.0, 0.5, 0.9. 


F. Select B such that: 


0.8 0.75 
B=|0.4 0.25 
0.0 0.5 


Find the (3 x 2) matrix a such that x = a’. 


6. Suppose that x,, and x, are integrated of orders 1 and 2, respectively. You are to 


sketch the proof that any linear combination of x,, and X2, is integrated of order 
. 2. Towards this end: 


A. Allow x,, and x, to be the random walk processes 


X= Xim + Ey, 


and 
Xur = Xai + Ez; 


i. Given the initial conditions x;9 and xz, show that the solution for x,, and 
Xp, have the form x,,= x19 + Le€,,_; and x, = X79 + Les, ;. 


ii. Show that the linear combination B,x,, + Baxa will generally contain a 
stochastic trend. 


iii, Wheat assumption is necessary to ensure that x,, and Xz are CIC, 1)? 


B. Now let x, be integrated of order 2. Specifically, let Axy, = Arg) + Ez, 
Given initial conditions for x, and x,,, find the solution for Xz- [You may al- 
low €,, and e€, to be perfectly correlated]. 

. B there any linear combination of x,, and x, that contains only a stochastic 
trend? 


Is there any linear combination of x,, and x, that does not contain a stochastic 
trend? 


C. Provide an intuitive explanation for the statement: If X,, and x», are integrated 


of orders d, and d, where d, > d}, any linear combination of x,, and Xz iS in- 
tegrated of order d}. 


ENDNOTES 


1. To include an intercept term, simply set all realizations of one {x,,} sequence equal to 
unity. In the text, the long-run relationship with an intercept will be denoted by Bo + 
Bix, +--+ +B,x,,= 0. Also note that the definition rules out the trivial case in which all 
elements of B equal zero. Obviously if all the B; = 0, Bx’ = 0. 


10. 


11. 


13. 


14. 


15. 


16. 


17. 


+ DUPPUSE tial Ay, ai AQ, HEE a ey Hee ogy 


ear combination of the form B,x,, + Baxa that is /(1). It is possible that this combination 
of x,, and xz is cointegrated with x3, such that the linear combination Bix, + Baxa + Byxs, 
is Stationary. 


. As a technical point, note that if all elements of x, are /(0), it is possible for e, to be inte- 


grated of order —1. However, this case is of little interest for economic analysis. Also 
note that if {x,} is stationary, A%x, is stationary for all d > 0. 


_ The issue is trivial if both trends are deterministic. Simply detrend each of the variables 


using a deterministic polynomial time trend of the form O + Of + Of +=. 


. From Chapter 3 you will recall that the decomposition of an /(1)} variable into a random 


walk plus a noise term is not unique. Stock and Watson confine their analysis to trends 
that-are random walks. 


. The error-correction term could have been written in the form OB, rz) ~ Barsei) 


Normalization with respect to the long-term rate yields (6.9), where Qs = a58, andB= . 
B/B,- Here, the cointegrating vector is (1, —B). 


. Note that (6.11) and (6.12) represent a system of first-order difference equations. The 
stability conditions place restrictions on the magnitudes of 5, @,, and the various values 7 


of a,fk). 


. Equation (6.18) can be written as A? = a,A. + a, where a, = (a,, + a37) and ay = (@)9d5; ~ 
1 iW 22 2 12621 


aiian). Now refer all the way back to Figure 1.5 in Chapter 1. For A, = 1, the coeffi- 
cients of (6.18) must lie along line segment BC. Hence, a, + a, = 1, or a}; + a22 + @)24y, 
— &,,āņ = 1. Solving for a,, yields (6.21). For EA < 1, the coefficients must lie inside 
region AOBC. Given (6.19), the condition a, ~ a, = | is equivalent to that in (6.21). 


. Another interesting way to obtain this result is to refer back to (6.14). If a). = 0, y, = 


441Y,-1 + € Imposing the condition {y,} is a unit root process is equivalent to setting a,, 
= 1, so that Ay, = €,,. 

As mentioned above, with three or more variables, various subsets may be cointegrated. 
For example, a group of [(2) variables may be C/(2, 1) or CI(2, 2) or a subset of /(1) 
variables may be CI(1, 1). Moreover, a set of CI(2, 1) variables may be cointegrated 
with a set of /(1) variables. Form the C/(2, 1) relation and determine whether the resul- 
tant is cointegrated with the /(1) variables. 

The stability/stationarity condition is such that —2 < a, < 0. Hence, if a, is found to be 
sufficiently negative, we need to be able to reject the null hypothesis a, = —2. 


. As shown in Section 3, the values of a, and a, are directly related to the characteristic 


roots of the difference equation system. Direct convergence necessitates that a, be nega- 
tive and q, positive. 

Engle and Granger (1987) does provide a statistic to test the joint hypothesis a, = a, = 0. 
However, their simulations suggest this statistic is not very powerful and recommend 
against its use. 

If a variable is found to be integrated of a different order than the others, the remaining 
variables can be tested for cointegration. 

Wholesale prices and period average exchange rates were used in the study. Each series 
was converted into an index number such that each series was equal to unity at the be- 
ginning of its respective period (either 1960 or 1973). In the fixed exchange rate period, 
all values of {e,} were set equal to unity. l 
A second set of regressions of the form p, = Bo + B,f, + H, was also estimated. The re- 
sults using this alternative normalization are very similar to those reported here. 

Use (6.44) only if the residuals from the equilibrium regression are serially uncorrelated. 
Any evidence that e€, is not white-noise necessitates using the augmented form of the test 


nih etes ee tees 


[i.e., Equation (6.45)]. Engle and Granger recommend using the augmented tests when 
there is any doubt about the nature of the data-generating process. The unaugmented 
tests have very low power if (6.44) is estimated when lags are actually present in the 
data-generating process. 

18. In Section 3, we allowed the disturbance to be serially correlated. Since we want to per- 
form significance tests, we need the error terms to be white-noise disturbances. 

19. The numbers are slightly different from those reported by Johansen and Juselius (1990) 
due to rounding. 

20. The Johansen procedure consists of the matrix of vectors of the squared canonical corre- 
lations between the residuals of x, and Ax,_, regressed on lagged values of Ax,. The coin- 
tegrating vectors are the rows of the normalized eigenvectors. 

21. Long-run money neutrality guarantees that nominal variables have only temporary ef- 
fects on real exchange rates. Proportional movements in prices and exchange rates may 
be observed in high inflation countries since the temporary effects of the vast money 
supply movements dwarf the consequences of the nonstationary changes in real vari- 
ables. 

22. The price series for Singapore runs from January 1974 through December 1989 and the 
series for Indonesia from January 1973 through April 1986. Unfortunately, it was not 
possible to obtain wholesale price indices for Hong Kong, Malaysia, or Taiwan. 
Although consumer price indices are readily available, the large weights given to non- 
tradables such as housing and services make them less appropriate for PPP comparisons. 

23. Respectively, Japan, the United States, Germany, and the U.K. are denoted as country 1, 
3, 4, and 5. Notice that the values of B,3, B,4, and B,, will differ for each country i; when 
there is a possible ambiguity, we use the notation B,,,; to denote the coefficient of r,, in 
the cointegrating relationship for country i. 

24. As in any difference equation system, the speed of adjustment term can be positive or 
negative. The critical factor is whether the characteristic roots of the system are all less 
than unity in absolute value. Notice that these roots are the estimated values of A; from 
the matrix of canonical correlations. In a sense, the Johansen (1988) procedure is a 
method to determine whether the characteristic roots of the difference equation system 
represented by an error-correction system imply convergence. 


APPENDIX: Characteristic Roots, Stability, and Rank 
Characteristic Roots Defined 
Let A be an (n X n) square matrix with elements a, and x an (n x 1) vector. The 
scalar A is called a characteristic root of A if 
Ax=)x o (A6.1) 
Let J be an (n x n) identity matrix, so that we can rewrite (A6.1) as 
Ax-Ax=0 


or 


(A-ADx=0 (A6.2) 


Since x is a vector containing values not identically equal to zero, (A.62) requires 
that the rows of (A ~ Ad) be linearly dependent. Equivalently, (A6.2) requires that. 
the determinant [A = a| = 0. Thus, we can find the characteristic root(s) of (A6.1) 
by finding the values of A that satisfy 


|A—dU] =0 (46.3) 


Example 1 
Let A be the matrix: _.. 


so that 


0.5- 0.2 
—0.2 0.5-1 


Solving for the value: of A. such that |A -A| =0 yields. the quadratic equation: 
12-A +0.21 =0 . 


The two values of À that solve the equation are 4 = 0.7 and A = 0.3. Hence, 0.7 
and 0.3 are the two characteristic roots. 

Example 2 

Now change A such that each element in column 2 is twice the corresponding 
value in column 1. Specifically, 


0.5 l 
 |-02 0.4 
Now, Moot gy te 
0.5-% 1 
|A -|= 
-0.2 -0.4-1 


Again, there are two values of A that solve |A -A| =0. Solving the quadratic 


equation à? — 0.1À = 0 yields the two characteristic roots 4, = 0 and A, = 0.1. 
Characteristic Equations 


Equation (A6.3) is called the characteristic equation of the square matrix A. Notice 
that the characteristic equation will be an nth-order polynomial in A. The reason is 


that the determinant |A - A| = 0 contains the nth degree term À” resulting from 
the expression: 


(a11 — A)(Gaq — A)(G33 = À) . - - Qin =À) 
As such, the characteristic equation will be an nth-order polynomial of the form: 
A BIAT! DANT? +b AN 4 +b, A +b, =0 (A6.4) 


From (A6.4), it immediately follows that an (n X n) square matrix will necessar- 
ily have n characteristic roots. As we saw in Chapter 1, some of the roots may be 
repeating and some may be complex. In practice, it is not necessary to actually cal- 
culate the values of the roots solving (A6.4). The necessary and sufficient condition 
for all characteristic roots to lie within the unit circle are given in the appendix to 
Chapter 1. 

Notice that the term b,„ is of particular relevance since b, = (-1)" LA |i After all, 
b,, is the only expression resulting from |A — AZ| that is not multiplied by A. In 
terms of (A6.4), the expressions À” and b,, will have the same sign if n is even and 
opposite signs if n is odd. In Example 1, the characteristic equation is A? - A+ 0.21 
= 0, so that b, = 0.21. Since |A| = 0.21, it follows that b, =.(-1)°(0.21). Similarly, 
in Example 2, the characteristic equation is A? — 0.14 = 0, so that b, = 0. Since it is 
also the case that |A| = 0, it also follows that a = 3 i [A |. In Peal 3 below, 
we consider the case in which n = 3. : z 
Example 3 
Let A be such that 


0.5- 02 02 
|A-A|=| 02 O5-A 02 
02 02 O5-’ 


fistic equation is 
2 — 1.527 + 0.634 — 0.081 =0 
and the characteristic roots are 
A, = 0.9, Az = 0.3, Az = 0.3 
The determinant of A is 0.081, so that b, = -0.081 = (-1)3|AI. 
Determinants and Characteristic Roots 


The determinant of an (n x n) matrix is equal to the product of its characteristic 
roots, that is 


n 
l4l=]] rj sig 5 (A6.5) 
i=l 3 i : 3 så 


where hi, Aa,- -> Àn = the n characteristic roots of the (n x n) matrix A 


- The proof of this important proposition is straightforward since the values A,, Aa, 
., A, Solve (A6.4). However, from the algebra of polynomials, the product of the 
factors of (A6.4) is equal to (-1)"b,,: 


In. =(-1)"b, 
i=l 


From the second section above, we also know that (—1)" Pa = “A I. Hence, (AS. 5) 
must hold in that the product (A,)(A2) ... (A) = (-1)"b, = 
Examples 1 to 3 continued: 
In Examples } and 2, the characteristic equation is quadratic of the form X? + bà + 
b, = 0. To find the roots of this quadratic equation, we seek the factors A, and A, 
such that 


(A-A,(A- 23) = 
or 
gay agit tte cet yh LAP = (AA + AA) +A A eh, 


W-(A, +AV)A+ AA, =O 


Clearly, the values A,A must equal b,. To check the formulas in Example 1, re- 
call that the characteristic equation is A? — à + 0.21 = 0. In this problem, the value 
of b, is 0.21, the product of the characteristic roots A,A, = (0.7)(0.3) = 0.21, and the 
determinant of A (0.5)* — (0.2)? = 0.21. In Example 2, the characteristic equation is 
7 — 0.14 = 0, so that b, = 0. The product of the characteristic roots is A,A, = 
(0.0)(0.1) = 0.0, and the determinant of A (0.5)(0.4) - (0.2) = 

In Example 3, the characteristic equation is cubic: A? — 1.54? + 0.634 — 0.081 = 
0. The value of b, is -0.081, the product of the characteristic roots (0.9)(0.3)(0.3) = 
0.081, and the determinant of A 0.081. 

Characteristic Roots and Rank 

The rank of a square (n X n) matrix A is the number of linearly independent rows 
(columns) in the matrix. The notation rank(A) = r means that the rank of A is equal 
to r. The matrix A is said to be of full rank if rank(A) =n. 


EPA LEETE Beis ENTAR 


4lo Mulhequauon LUNE-Oeries MUUE 


From the discussion above, it follows that the rank of A is equal to the number of 
its nonzero characteristic roots. Certainly, if all rows of A are linearly independent, 
the determinant of A is not equal to zero. From (A6.5), it follows that none of the 
characteristic roots can equal zero if |A| #0. At the other extreme, if rank(A) = 0, 
each element of A must equal zero. When rank(A) = 0, the characteristic equation 
degenerates into A” = 0 with the solutions A, = A, = = =A, = 0. Consider the inter- 
mediate cases in which 0 < rank(A) = r < n. Since interchanging the various rows of 
a matrix does not alter the absolute value of its determinant, we can always rewrite 
|a- A| = 0 such that the first r rows comprise the r linearly independent rows of 
A. The determinant of these first r rows will contain r characteristic roots. The other 
(n — r) roots will be zeros. 

In Example 2, rank(A) = 1 since each element in row 1 equals —2.5 times the cor- 
responding element in row 2. For this case, |a| =0 and exactly one characteristic 
root is equal to zero. In the other two examples, A is of full rank and all characteris- 
tic roots differ from zero. 

Example 4 
Now consider a (3 x 3) matrix A such that rank(A) = 1. Let 


05-2 02 0.2 
|A-al=| 1 04-2 0.4 
-0.25 -01 -O1-2 " 


The rank of A is unity since row 2 is twice row 1 and row 3 is -0.5 times row 1. 
The determinant of A equals zero and the characteristic equation is given by 


X — 0.827 = 0 
The three characteristic roots are A, = 0.8, A, = 0, and A, = 0. 


Stability of a First-Order VAR 
Let x, be the (n X 1) vector (Xin Xz +» -s Xn) and consider the first-order VAR 


X,= Ag tA Xm tE, (A6.6) 
where A, = an (n x 1) vector with elements a,, 
A = an(n xn) square matrix with elements a, 
€, = the (n x 1) vector of white-noise disturbances (€n Ezp =. «+ Eny. 


To check the stability of the system, we need only examine the homogeneous equa- 
tion: 


HeA eS ee eee ee (A6) 


aa 


We can use the method of undetermined coefficients and for each x, posit a sólu- 
tion of the form: 


Xa = CM (A6.8) 


where c; = an arbitrary constant 


If (A6.8) is to be a solution, it must satisfy each of the n equations represented by 
(A6.7). Substituting x, = cÀ and x,_, = GA“! for each of the x, in (A6.7), we get 


og QA = ay An! + aCA + H ane A! 
© CN = ag C A! E aCA H e + apC A 
CaM! = a3 C AT! + aaa AT) + + ag, 


C A = ap C AT! H apC AT) + + Gag, 


Now, divide each equation by 2^! and collect terms to form 


cila = A) +c + C3043 1 t Cnlin =0 
CA, + Calaza — A) + C3093 + + Chaon =0 
CiQn + CQ? +30n3 e + Cp(Ann A) =O 


so that the following system of equations must be satisfied: 


ay-h an ag > ay ĉi 0 
a) Any — A ag o Ade caj 0 
0 

Qn} Ann An Onn TA J| Cn 0 


For a nontrivial solution to the system of equations, the following determinant 
must equal zero: 


(a, =À) aiz ayo ain 
a) (an =À) a} + an | 0 
any ane ayy U (ann -À) 


The determinant will be an nth-order polynomial that is satisfied by n values of 
À. Denote these n characteristic roots by Àj, Ay,.., An Since each is a solution to 
the homogeneous equation, we know that the following linear combination of the 


homogeneous solutions is also a homogeneous solution: 
Xa = didi + dy + +d,M, 


Note that each {x,,} sequence will have the same roots. The necessary and suffi- 
cient condition for stability is that all characteristic roots lie within the unit circle. 
Cointegration and Rank 
The relationship between the rank of a martix and its characteristic roots is critical 
in the Johansen procedure. Using the notation from Section 7, let: 


x,= Axe +E, 
so that 


Ax, = (A, ~ Dx, + €, 
=X tE, 


If the rank of 7 is unity, all rows of m can be written as a scalar multiple of the 
first. Thus, each of the {Ax,,} sequences can be written as 


Ax ig = SAM Xp + MAXa H + My Xn) E Ey 


where s,;= l 
S; = T/T, 


Hence, the linear combination 1.421 + Miami to + My Xa = (AX; — €;)/5; is 
Stationary since both.Ax,, and €; are stationary. 

The rank of m equals the number of cointegrating vectors. If rank(x) = r, there 
are r linearly independent combinations of the {x,,} sequences that are stationary. If 
rank(1) = n, all variables are stationary. 

The rank of 7 is equal to the number of its characteristic roots that differ from 
zero. Order the roots such that 1, > A, > + > A, The Johansen methodology allows 
you to determine the number of roots that are statistically different from zero. The 
relationship between A, and 7 is such that if all characteristic roots of A}, are in the 
unit circle, 7 is of full rank. 


STATISTICAL TABLES 


Table A Empirical Cumulative Distribution of t 


Probability of a Smaller Value 
Sample Size 0.01 0.025 0.05 0.10 0.90 0.95 0.975 0.99 


No Constant or Time (a = a, = 0) T 

25 -2.66 -2.26 -1.95 —1.60 0.92 1.33 1.70 2.16 

50 -2.62 -2.25 -1.95  —~—1.61 0.91 1.31 1.66 2.08 
100 -2.60  ~2.24 -1.95  -1.61 0.90 1.29 1.64 2.03 
250 -2.58 -2.23 -1.95 -1.62 0.89 1.29 1.63 2.01 
300 -2.58  ~2.23 -1.95 -1.62 0.89 1.28 1.62 2.00 
eo ~2.58  -2.23 -1.95  -1.62 0.89 1.28 1.62 2.00 
Constant (a, = 0) Tu 

25 -3.75 ~3.33 -3.00 -2.62 -0.37 0.00 0.34 0.72 

50 -3.58  ~3.22 -2.93 -2.60 -0.40 -0.03 0.29 0.66 
100 -3.51  ~3.17 -2.89 -2.58 -0.42 -0.05 0.26 0.63 
250 -3.46  ~3.14 -2.88 -2.57 0.42 -0.06 0.24 0.62 
500 -3.44 -3.13 -2.87 -2.57 -0.43 -0.07 -0.24 0.61 
co -3.43 ~3.12 -2.86 -257 —-0.44 —0.07 0.23 0.60 
Constant + time Try 

25 4.38 ~3.95 -3.60 -3.24 -1.14 -080 -050 -0.15 

50 -4.15 ~3.80 -3.50 -3.18 -1.19 O87 -0.58 -0.24 
100 4.04 ~3.73 -3.45  -3.15 -1.22 -0.90 -0.62 -0.28 
250 -3.99 ~3.69 -3.43 -3.13 -1.23 -0.92 -0.64 -031 
500 -3.98  ~3.68 -3.42 -3.13 -124 -0.93 -0.65 -0.32 
o -3.96 -3.66 -3.41 -3.12 -125 -0.94 -0.66  ~0.33 


Source: This table was constructed by David A. Dickey using Monte Carlo methods. Standard errors of 
the estimates vary, but most are less than 0.20, The table is reproduced from Wayne Fuller, Introduction 
to Statistical Time Series. (New York: John Wiley). 1976. 


